CMT655: Manipulating and Exploiting Data
| School | Cardiff School of Computer Science and Informatics |
| Department Code | COMSC |
| Module Code | CMT655 |
| External Subject Code | 100370 |
| Number of Credits | 20 |
| Level | L7 |
| Language of Delivery | English |
| Module Leader | Dr Nico Potyka |
| Semester | Spring Semester |
| Academic Year | 2025/6 |
Outline Description of Module
Modern applications rely on data to provide a meaningful experience to users. The aim of this module is to expand on the use of databases from previous modules, discuss the definition of data and its sources, then examine the data manipulation layer of a multi-tiered architecture, exploring how to convert data into meaningful results, utilising techniques like classification, and utilising large scale Databases.
On completion of the module a student should be able to
-
Critically assess the legal, social and ethical implications of data collection, storage, and knowledge extraction.
-
Manipulate a range of supplied data into usable formats for various purposes, justifying the use of appropriate tools and critically appraising their underlying mechanisms and concepts.
-
Design software based on requirements that uses third party libraries for processing and extracting knowledge (e.g., for statistical analysis and natural language processing tasks).
-
Package processed data in suitable formats for systems to use (e.g., JSON, CSV, web API).
-
Analyse, design, and transform a conceptual schema into the schema of an appropriate database management system.
How the module will be delivered
The module will be delivered through a combination of lectures, supervised lab sessions and tutorials as appropriate. You will be expected to attend all timetabled sessions and engage with online material. You will be guided through learning activities appropriate to your module, which may include:
on-line resources that you work through at your own pace (e.g. videos, web resources, e-books, quizzes),
on-line interactive sessions to work with other students and staff (e.g. discussions, live streaming of presentations, live-coding, team meetings)
face to face small group sessions (e.g. help classes, feedback sessions)
Skills that will be practised and developed
Design appropriate software to generate meaningful content from data.
Processing data sets of varying sizes adopting appropriate data science techniques.
Exploiting the use of third party APIs and libraries for the extraction of knowledge.
Professional and ethical behaviour, time management, critical, and defensible thinking.
How the module will be assessed
A blend of assessment types which may include coursework and portfolio assessments, class tests, and/or formal examinations
Students will be provided with reassessment opportunities in line with University regulations.
Assessment Breakdown
| Type | % | Title | Duration(hrs) |
|---|---|---|---|
| Portfolio | 100 | Portfolio | N/A |
Syllabus content
Using third party libraries (e.g., Pandas, NumPy)
Basic notions on statistics
Data cleaning and processing tasks
Statistical Analysis programming tasks (e.g., regression, correlation)
Data formats (e.g., JSON, CSV)
Legal, social, and ethical considerations in analysing data
Designing database systems (e.g., relational and non- relational databases)
Using database systems for data processing tasks (e.g., SQL)