CS982 - Big Data Technologies
TIMETABLE | TEACHING MATERIAL |
Credits | 20 |
Level | 5 |
Semester | Semester 1 |
Availability | Possible elective |
Prerequisites | N/A |
Learning Activities Breakdown | Lectures: 20 hours | Labs: 20 hours Homework / Private Study: 160 |
Items of Assessment | 2 |
Assessment | Coursework (50%) and Final Examination (50%) |
Lecturer | Joseph El Gemayel, William Bell |
Aims and Objectives
The objectives of the module are to:
- Provide students with a comprehensive understanding of the challenges and opportunities presented by big data, including its modelling, storage, and access.
- Equip students with a critical understanding of the theoretical foundations and practical applications of key algorithms and techniques used in data analytics.
- Develop students’ ability to analyse the challenges involved in manipulating large data samples, including those related to storage, access, and processing.
- Provide students with knowledge of relational and NoSQL databases, including their structures, use cases, and limitations.
- Introduce students to tools and frameworks used for distributed data processing, highlighting their design principles and practical use.
- Enable students to design and implement a basic data processing pipeline that integrates theoretical understanding with practical tools.
Learning Outcomes
After completing this module, students will be able to:
- Apply fundamental Python programming skills to engage with a range of big data technologies and tools.
- Explain and evaluate the use of classical statistical techniques in modern data analysis contexts.
- Assess the suitability of different data analysis methods and technologies for specific problem domains, considering both their capabilities and limitations.
- Describe and compare relational and NoSQL database models, including schema design considerations and associated trade-offs.
- Evaluate distributed file systems and data processing frameworks in terms of scalability, fault tolerance, and performance.
- Design and implement a basic data processing pipeline using appropriate tools and technologies.
Syllabus
- Fundamentals of Python programming for data analysis, including data structures, libraries, and scripting for automation.
- Overview of supervised learning techniques for classification and regression, and unsupervised models for clustering. Emphasis on both theoretical foundations and practical application.
- Overview of storage solutions, including relational, NoSQL, distributed file systems and cloud solutions.
- Distributed data processing using current data analysis tools, pipeline approaches and dashboards.
Recommended Reading
This list is indicative only – the class lecturer may recommend alternative reading material. Please do not purchase any of the reading material listed below until you have confirmed with the class lecturer that it will be used for this class.
Learning Python. Lutz, M., 6th edition, O’Reilly Media, Inc. 2025 | Stocked at Amazon (Other retailers are available)
Python for Data Analysis 3e: Data Wrangling with pandas, NumPy, and Jupyter. McKinney, W, 3rd edition, O’Reilly Media, Inc. 2022 | Stocked at Amazon (Other retailers are available)
Learning Spark 2e: Lightning-Fast Data Analytics. Damji, J. et al, 2nd edition, O’Reilly Media, Inc. 2020. |
Stocked at Amazon (Other retailers are available)
Last updated: 2025-08-19 10:58:58