CS412 – Information Access & Mining

Credits 20
Level 4
Prerequisites CS104 Information and Information Systems
Semester 2
Elective No
Contact Lectures: 20 | Tutorials: 0 | Labs: 20
Assignments: 80 | Self study: 80
Assessment The class will be assessed by means of coursework (30%) and written examination (70%). There will be formative exercises on a weekly basis and a substantial piece of summative coursework.

Dr Dmitri Roussinov

Aims and Objectives

This class will enable the student to understand the fundamentals of information access and information mining. The class will cover a range of techniques for extracting information from textual and non-textual resources, modelling the information content of resources, detecting patterns within information resources and making use of these patterns. It will focus particularly on unstructured textual information found on the web.

Learning Outcomes

On completion of the class, a student should be able:

  • to demonstrate a knowledge of the issues involved in representing information stored in electronic form, including text, images, video and speech
  • to demonstrate a knowledge of techniques for the access of information based on such representations
  • to demonstrate a knowledge of the broad range of techniques for evaluating information access systems
  • to demonstrate a knowledge of statistical and machine learning approaches for detecting patterns within information
  • to demonstrate a knowledge of the practical applications of large-scale information mining approaches
  • to demonstrate an awareness of the current challenges of constructing large-scale information access systems


Introduction, Information Access, the Web and Electronic Archives, including commercial and non-commercial applications.

Textual Information Storage & Retrieval: indexing, querying, ranking and retrieval. Document analysis and representation. Information Retrieval models (e.g. Boolean, Vector Space, Probabilistic). Relevance feedback and clustering.

Textual Information Filtering: machine learning and information access. Content-based and collaborative-based filtering.

Image Retrieval: retrieval of images by semantic and visual features.

Video Retrieval: video segmentation, representation, storage and retrieval. Document visualisation: presentation of multimedia retrieval results.

Web search engines: indexing and retrieval of multimedia information on the Web. Structured document indexing and retrieval. Analysis of link connectivity.

Information mining: detecting patterns within data. Techniques for information mining including clustering, classification, and association rule learning.  Metrics for information mining.

Applications of deep neural networks to natural language processing and computer vision.

Indicative Reading*

* This list is indicative only – the class lecturer may recommend alternative reading material. Please do not purchase any of the reading material listed below until you have confirmed with the class lecturer that it will be used for this class.

Finding Out About. R. Belew. Cambridge University Press, 2001

Information Retrieval in Practice. B. Croft, D. Metzler and T. Strohman. Pearson Education, 2009

Information Retrieval: Algorithms and Heuristics. D. A. Grossman and O. Frieder. Springer, 2004

Introduction to Information Retrieval. Christopher D. Manning, Prabhakar Raghavan and Hinrich Schutze. Cambridge University Press, 2008

Yann LeCun, Yoshua Bengio & Geoffrey Hinton. Deep learning. Nature, volume 521, pages 436–444 (28 May 2015)