Machine learning for natural language processing

Course syllabus adopted 2021-02-26 by Head of Programme (or corresponding).

Overview

Swedish nameMaskininlärning för språkteknologi
CodeDAT450
Credits7.5 Credits
OwnerMPDSC
Education cycleSecond-cycle
Main field of studySoftware Engineering
DepartmentCOMPUTER SCIENCE AND ENGINEERING
GradingTH - Pass with distinction (5), Pass with credit (4), Pass (3), Fail

Course round 1

Teaching language English
Application code 87123
Maximum participants50
Minimum participants10
Block schedule
Open for exchange studentsNo

Credit distribution

Module	Sp1	Sp2	Sp3	Sp4	Summer	Not Sp	Examination dates
0120 Written and oral assignments 7.5 c Grading: TH		7.5 c

In programmes

Examiner

Richard Johansson
Head of Unit, Data Science and AI, Computer Science and Engineering
Contact

Go to coursepage

Eligibility

General entry requirements for Master's level (second cycle)
Applicants enrolled in a programme at Chalmers where the course is included in the study programme are exempted from fulfilling the requirements above.

Specific entry requirements

English 6 (or by other approved means with the equivalent proficiency level)
Applicants enrolled in a programme at Chalmers where the course is included in the study programme are exempted from fulfilling the requirements above.

Course specific prerequisites

The course requires at least 7.5 credits of programming, 7.5 credits of probability theory or statistics, and a first course in machine learning, such as DAT340, TDA233, SSY340 or MVE440.

Aim

The course gives an introduction to machine learning models and architectures used in modern natural language processing (NLP) systems.

Learning outcomes (after completion of the course the student should be able to)

Knowledge and understanding:

describe the fundamentals of storing textual data for the world's languages,
describe the most common types of natural language processing tasks,
describe the most common types of machine learning models used in modern natural language processing,
explain how text data can be annotated for a natural language processing task where machine learning techniques are used.

Competence and skills:

apply software libraries using machine learning for common natural processing tasks,
write the code to implement some machine learning models for natural language processing,
apply evaluation methods to assess the quality of natural language processing systems.

Judgement and approach:

discuss the advantages and limitations of different machine learning models with respect to a given task in natural language processing,
reason about what type of data could be useful when training a model for a given natural language processing task,
select the appropriate evaluation methodology for a natural language processing system and motivate this choice,
reason about ethical questions pertaining to machine learning based natural language processing systems, such as stereotypes and under-representation.

Content

Rapid developments in machine learning have revolutionized the field of NLP, including for commerically important applications such as translation, summarization, and information extraction. However, natural language data exhibit a number of peculiarities that make them more challenging to work with than many other types of data commonly encountered in machine learning: natural language is discrete, structured, and highly ambiguous. It is extremely diverse: not only are there thousands of languages in the world, but in each language there is substantial variation in style and genre. Furthermore, many of the phenomena encountered in language follow long-tail statistical distributions, which makes the production of training data more costly. For these reasons, machine learning architectures for NLP applications tend to be quite different from those used in other fields.

The course covers the following broad areas:

Working practically with text data, including fundamental tasks such as tokenization and word counting;
probabilistic models for text, such as topic models;
overview of the most common types of NLP applications;
architectures for representation in NLP models, including word embeddings, convolutional and recurrent neural network, and attention models;
machine learning models for common types of NLP problems, mainly categorization, sequence labeling, structured prediction and generation;
approaches to transfer learning in NLP.

Organisation

Lectures and computer labs

Literature

Course literature to be announced the latest 8 weeks prior to the start of the course.

Examination including compulsory elements

The course is examined by mandatory written assignments submitted as written reports, as well as a self-defined project that requires the submission of a written report and an oral presentation. Some of the assignments will be carried out individually and others in groups of normally 2-4 students. The project is conducted by 2-4 students.

A late submission of the assignments or project results in the grade Fail (U), unless special reasons exist. A failed assignment or project will be given the opportunity to submit a new solution on subsequent occasions the course is given.

A passing grade for the entire course requires at least a passing grade for all assignments and the project.
To be awarded a higher passing grade for the entire course, the student must, in addition, have a higher average on the weighted grades on the assignments and the project.

The course examiner may assess individual students in other ways than what is stated above if there are special reasons for doing so, for example if a student has a decision from Chalmers about disability study support.

Course syllabus for Machine learning for natural language processing

Overview

Course round 1

Credit distribution

In programmes

Examiner

Eligibility

Specific entry requirements

Course specific prerequisites

Aim

Learning outcomes (after completion of the course the student should be able to)

Content

Organisation

Literature

Examination including compulsory elements

Overview