CMT122: Machine Learning for NLP

School	Cardiff School of Computer Science and Informatics
Department Code	COMSC
Module Code	CMT122
External Subject Code	100992
Number of Credits	20
Level	L7
Language of Delivery	English
Module Leader	Dr Nedjma Ousidhoum
Semester	Autumn Semester
Academic Year	2025/6

Outline Description of Module

Machine Learning (ML), the study of methods for developing computer programs able to learn from examples or prior experience, is currently well established in areas such as computer vision, robotics, artificial intelligence or cybersecurity. Natural Language Processing (NLP) similarly benefits dramatically from ML techniques . However, this data poses unique challenges, among others due to the existence of different language typologies and the various forms in which linguistic data appears (social media, patents, news stories or domain-specific documents, e.g., medical or financial). This module will expose you to core ML algorithms and their application to typical NLP tasks, discussing their strengths and limitations. Moreover, fundamental ML concepts such as data preparation, model optimization, feature selection and experimental design will be also covered, but always with a strong focus on the NLP context.

On completion of the module a student should be able to

Identify, implement and evaluate ML pipelines in the context of NLP

Critically assess the suitability of a ML algorithm for a given NLP task and dataset

Implement, debug and optimize neural network architectures for NLP problems

Track, package and deliver datasets and models following good ML practices

Use, and comprehensively justify the use of, third party APIs for NLP.

Critically reflect on the role of data on the performance of ML models for NLP

How the module will be delivered

This module will use a combination of lectures and labs. The lectures will be used to teach the main principles and to prepare and review the lab assignments. During the labs, the students will be able to put these principles into practice, e.g. by implementing a method for handling text data, or for reflecting on the performance and limitations of a given method.

Skills that will be practised and developed

Implementing end-to-end ML solutions for NLP using Python Retrieving, cleaning, processing and enriching text data Obtaining insights from unannotated text data using unsupervised ML techniques such as clustering Investigating nuances in textual datasets to prevent unwanted biases in subsequent ML models Critically thinking about which tools are appropriate in what contexts Formalising real-world problems in a rigorous way

How the module will be assessed

A blend of assessment types which may include programming assignments, courseworks and a group project. Formative assessments will be provided in the form of lab sheets with group feedback. ## Assessment Details

Assessment Breakdown

Type	%	Title	Duration(hrs)
Written Assessment	50	Individual Assessment: Implementation And Evaluation	N/A
Written Assessment	50	Machine Learning Group Project On An Nlp Problem	N/A

Syllabus content

Historical perspectives on Machine Learning for Natural Language Processing

Data preprocessing: feature analysis and selection, dimensionality reduction

Machine Learning Evaluation: designing experiments, cross-validation, statistical testing, evaluation metrics (e.g. accuracy, precision, recall, F1)

Supervised ML for NLP

Linear ML (SVMs, logistic regression)

Neural networks

Unsupervised

Clustering

Generative models (e.g., LDA)

Self-supervised learning

Embeddings

Language models

Python ML frameworks (tensorflow, pytorch, transformers and simpletransformers, scikit-learn, Flair)

Large Language Models

Copyright Cardiff University. Registered charity no. 1136855