CS982 – Big Data Technologies

TIMETABLE TEACHING MATERIAL
Credits 20
Level 5
Semester 1
Prerequisites N/A
Availability Possible elective
Contact Lectures: 20 hours | Labs: 20 hours
Homework / Private Study: 160
Assessment One individual assignment worth 50%, and a final 2-hour examination worth 50%.
Resit TBC
Lecturer Dr Martin Halvey

General Aims

The aim of this module is to endow students with:

  • an understanding of the new challenges posed by the advent for big data, as they refer to its modelling, storage, and access;
  • an understanding of the key algorithms and techniques which are embodied in data analytics solutions;
  • an exposure to a number of different big data technologies and techniques, to show how they can achieve efficiency and scalability while also addressing design trade-offs and their impacts.

Learning Outcomes

After completing this module participants will be able to:

  • understand the fundamentals of Python to enable the use of various big data technologies;
  • understand how classical statistical techniques are applied in modern data analysis;
  • understand the potential application of data analysis tools for various problems and appreciate their limitations;
  • be familiar with a number of different cloud NoSQL systems and their design and implementation, showing how they can achieve efficiency and scalability while also addressing design trade-offs and their impacts;
  • be familiar with the Map-Reduce programming paradigm, to enable students to write programs which can execute in massively parallel cloud based infrastructures.

Syllabus

  • Introduction to Python;
  • Introduction to R objects, data types and descriptive statistics;
  • Quantitative methods for data analysis and knowledge extraction including classification and regression, clustering, association rules, Bayesian approaches, decision trees;
  • Overview of various NoSQL cloud storage systems such as document stores like MongoDB, column stores like Cassandra and graph databases like Neo4j;
  • Distributed data processing with Hadoop and MapReduce.

Recommended Text/Reading*

* This list is indicative only – the class lecturer may recommend alternative reading material. Please do not purchase any of the reading material listed below until you have confirmed with the class lecturer that it will be used for this class.

Learning Python. Lutz, M. O’Reilly Media, Inc. 2013. ISBN-13: 978-1449355739 | Stocked at Amazon (Other retailers are available)

R Manuals at http://www.r-project.org

Hadoop: The Definitive Guide. White, T. 4th edition, O’Reilly Media, Inc. 2015. | Stocked at Amazon (Other retailers are available)

Big Data Analytics: From Strategic Planning to Enterprise Integration with Tools, Techniques, NoSQL and Graph. Loshin, D. Elsevier. 2013. ISBN-13: 978-0124173924 | Stocked at Amazon (Other retailers are available)