CS412 – Information Access & Mining

TIMETABLE TEACHING MATERIAL
Credits 20
Level 4
Prerequisites The class has no formal prerequisites although experience of CS209 User & Data Modelling would be helpful
Availability Semesters 1 and 2
Elective No
Contact Lectures: 22 | Tutorials: 0 | Labs: 22
Assignments: 82 | Self study: 74
Assessment The class will be assessed by means of coursework (30%) and written examination (70%). There will be formative exercises on a weekly basis and a substantial piece of summative coursework.
Lecturer

Dr Dmitri Roussinov

Aims and Objectives

This class will enable the student to understand the fundamentals of information access and information mining. The class will cover a range of techniques for extracting information from textual and non-textual resources, modelling the information content of resources, detecting patterns within information resources and making use of these patterns.

Learning Outcomes

On completion of the class, a student should be able:

  • to demonstrate a knowledge of the issues involved in representing information stored in electronic form, including text, images, video and speech
  • to demonstrate a knowledge of the techniques for the access of information based on such representations
  • to demonstrate a knowledge of the broad range of techniques for evaluating information access systems
  • to demonstrate a knowledge of statistical approaches for detecting patterns within information
  • to demonstrate a knowledge of the practical applications of large-scale information mining approaches
  • to demonstrate an awareness of the current challenges of constructing large-scale information access systems
  • to demonstrate practical competence in the range of issues associated with the class

Syllabus

Introduction, Information Access, the Web and Electronic Archives, including commercial and non-commercial applications.

Textual Information Storage & Retrieval: indexing, querying, ranking and retrieval. Document analysis and representation. Information Retrieval models (e.g. Boolean, Vector Space, Probabilistic, Fuzzy). Relevance feedback and clustering.

Textual Information Filtering: machine learning and information access. Content-based and collaborative-based filtering.

Image Retrieval: retrieval of images by semantic and visual features. Manual image indexing. Indexing by visual features (e.g. colour, texture, shape). Storage and retrieval.

Video Retrieval: video segmentation, representation, storage and retrieval. Document visualisation: presentation of multimedia retrieval results.

Evaluation of Information Access systems: user and system-oriented evaluation. TREC. User and task-oriented evaluation.

Web search engines: indexing and retrieval of multimedia information on the Web. Structured document indexing and retrieval. Analysis of link connectivity.

Information mining: detecting patterns within data. Techniques for information mining including clustering, classification, and association rule learning. Metrics for information mining.

Indicative Reading*

* This list is indicative only – the class lecturer may recommend alternative reading material. Please do not purchase any of the reading material listed below until you have confirmed with the class lecturer that it will be used for this class.

Finding Out About. R. Belew. Cambridge University Press, 2001.

Information Retrieval in Practice. B. Croft, D. Metzler and T. Strohman. Pearson Education, 2009

Information Retrieval: Algorithms and Heuristics. D. A. Grossman and O. Frieder. Springer, 2004

Introduction to Information Retrieval. Christopher D. Manning, Prabhakar Raghavan and Hinrich Schutze. Cambridge University Press, 2008