CS982 - Big Data Technologies

TIMETABLE	TEACHING MATERIAL
Credits	20
Level	5
Semester	Semester 1
Availability	Possible elective
Prerequisites	N/A
Learning Activities Breakdown	Lectures: 20 hours \| Labs: 20 hours Homework / Private Study: 160
Items of Assessment	2
Assessment	Coursework (50%) and Final Examination (50%)
Resit	Exam (100%)
Lecturer	Joseph El Gemayel, William Bell

The objectives of the module are to:

Provide students with a comprehensive understanding of the challenges and opportunities presented by big data, including its modelling, storage, and access.
Equip students with a critical understanding of the theoretical foundations and practical applications of key algorithms and techniques used in data analytics.
Develop students’ ability to analyse the challenges involved in manipulating large data samples, including those related to storage, access, and processing.
Provide students with knowledge of relational and NoSQL databases, including their structures, use cases, and limitations.
Introduce students to tools and frameworks used for distributed data processing, highlighting their design principles and practical use.
Enable students to design and implement a basic data processing pipeline that integrates theoretical understanding with practical tools.

After completing this module, students will be able to:

Apply fundamental Python programming skills to engage with a range of big data technologies and tools.
Explain and evaluate the use of classical statistical techniques in modern data analysis contexts.
Assess the suitability of different data analysis methods and technologies for specific problem domains, considering both their capabilities and limitations.
Describe and compare relational and NoSQL database models, including schema design considerations and associated trade-offs.
Evaluate distributed file systems and data processing frameworks in terms of scalability, fault tolerance, and performance.
Design and implement a basic data processing pipeline using appropriate tools and technologies.

Fundamentals of Python programming for data analysis, including data structures, libraries, and scripting for automation.
Overview of supervised learning techniques for classification and regression, and unsupervised models for clustering. Emphasis on both theoretical foundations and practical application.
Overview of storage solutions, including relational, NoSQL, distributed file systems and cloud solutions.
Distributed data processing using current data analysis tools, pipeline approaches and dashboards.

Syllabuses - UG