CMT122: Machine Learning for NLP
| School | Cardiff School of Computer Science and Informatics |
| Department Code | COMSC |
| Module Code | CMT122 |
| External Subject Code | 100992 |
| Number of Credits | 20 |
| Level | L7 |
| Language of Delivery | English |
| Module Leader | Dr Nedjma Ousidhoum |
| Semester | Autumn Semester |
| Academic Year | 2025/6 |
Outline Description of Module
Machine Learning (ML), the study of methods for developing computer programs able to learn from examples or prior experience, is currently well established in areas such as computer vision, robotics, artificial intelligence or cybersecurity. Natural Language Processing (NLP) similarly benefits dramatically from ML techniques . However, this data poses unique challenges, among others due to the existence of different language typologies and the various forms in which linguistic data appears (social media, patents, news stories or domain-specific documents, e.g., medical or financial). This module will expose you to core ML algorithms and their application to typical NLP tasks, discussing their strengths and limitations. Moreover, fundamental ML concepts such as data preparation, model optimization, feature selection and experimental design will be also covered, but always with a strong focus on the NLP context.
On completion of the module a student should be able to
-
Identify, implement and evaluate ML pipelines in the context of NLP
-
Critically assess the suitability of a ML algorithm for a given NLP task and dataset
-
Implement, debug and optimize neural network architectures for NLP problems
-
Track, package and deliver datasets and models following good ML practices
-
Use, and comprehensively justify the use of, third party APIs for NLP.
-
Critically reflect on the role of data on the performance of ML models for NLP
How the module will be delivered
This module will use a combination of lectures and labs. The lectures will be used to teach the main principles and to prepare and review the lab assignments. During the labs, the students will be able to put these principles into practice, e.g. by implementing a method for handling text data, or for reflecting on the performance and limitations of a given method.
Skills that will be practised and developed
Implementing end-to-end ML solutions for NLP using Python Retrieving, cleaning, processing and enriching text data Obtaining insights from unannotated text data using unsupervised ML techniques such as clustering Investigating nuances in textual datasets to prevent unwanted biases in subsequent ML models Critically thinking about which tools are appropriate in what contexts Formalising real-world problems in a rigorous way
How the module will be assessed
A blend of assessment types which may include programming assignments, courseworks and a group project. Formative assessments will be provided in the form of lab sheets with group feedback. ## Assessment Details
Assessment Breakdown
| Type | % | Title | Duration(hrs) |
|---|---|---|---|
| Written Assessment | 50 | Individual Assessment: Implementation And Evaluation | N/A |
| Written Assessment | 50 | Machine Learning Group Project On An Nlp Problem | N/A |
Syllabus content
-
Historical perspectives on Machine Learning for Natural Language Processing
-
Data preprocessing: feature analysis and selection, dimensionality reduction
-
Machine Learning Evaluation: designing experiments, cross-validation, statistical testing, evaluation metrics (e.g. accuracy, precision, recall, F1)
-
Supervised ML for NLP
-
Linear ML (SVMs, logistic regression)
-
Neural networks
-
Unsupervised
-
Clustering
-
Generative models (e.g., LDA)
-
Self-supervised learning
-
Embeddings
-
Language models
-
Python ML frameworks (tensorflow, pytorch, transformers and simpletransformers, scikit-learn, Flair)
-
Large Language Models