Course syllabus for Machine learning for natural language processing

The course syllabus contains changes
See changes

Course syllabus adopted 2020-08-18 by Head of Programme (or corresponding).

Overview

  • Swedish nameMaskininlärning för språkteknologi
  • CodeDAT450
  • Credits7.5 Credits
  • OwnerMPDSC
  • Education cycleSecond-cycle
  • Main field of studySoftware Engineering
  • DepartmentCOMPUTER SCIENCE AND ENGINEERING
  • GradingTH - Pass with distinction (5), Pass with credit (4), Pass (3), Fail

Course round 1

  • Teaching language English
  • Application code 87120
  • Maximum participants50
  • Minimum participants10
  • Block schedule
  • Open for exchange studentsNo

Credit distribution

0120 Written and oral assignments 7.5 c
Grading: TH
7.5 c

In programmes

Examiner

Go to coursepage (Opens in new tab)

Eligibility

General entry requirements for Master's level (second cycle)
Applicants enrolled in a programme at Chalmers where the course is included in the study programme are exempted from fulfilling the requirements above.

Specific entry requirements

English 6 (or by other approved means with the equivalent proficiency level)
Applicants enrolled in a programme at Chalmers where the course is included in the study programme are exempted from fulfilling the requirements above.

Course specific prerequisites

The course requires at least 7.5 credits of programming, 7.5 credits of probability theory or statistics, and a first course in machine learning, such as DAT340, TDA233, SSY340 or MVE440.

Aim

The course gives an introduction to machine learning models and architectures used in modern natural language processing (NLP) systems.

Learning outcomes (after completion of the course the student should be able to)

Knowledge and understanding:
  • describe the fundamentals of storing textual data for the world's languages,
  • describe the most common types of natural language processing tasks,
  • describe the most common types of machine learning models used in modern natural language processing,
  • explain how text data can be annotated for a natural language processing task where machine learning techniques are used.
Competence and skills:
  • apply software libraries using machine learning for common natural processing tasks,
  • write the code to implement some machine learning models for natural language processing,
  • apply evaluation methods to assess the quality of natural language processing systems.
Judgement and approach:
  • discuss the advantages and limitations of different machine learning models with respect to a given task in natural language processing,
  • reason about what type of data could be useful when training a model for a given natural language processing task,
  • select the appropriate evaluation methodology for a natural language processing system and motivate this choice,
  • reason about ethical questions pertaining to machine learning based natural language processing systems, such as stereotypes and under-representation.

Content

Rapid developments in machine learning have revolutionized the field of NLP, including for commerically important applications such as translation, summarization, and information extraction. However, natural language data exhibit a number of peculiarities that make them more challenging to work with than many other types of data commonly encountered in machine learning: natural language is discrete, structured, and highly ambiguous. It is extremely diverse: not only are there thousands of languages in the world, but in each language there is substantial variation in style and genre. Furthermore, many of the phenomena encountered in language follow long-tail statistical distributions, which makes the production of training data more costly. For these reasons, machine learning architectures for NLP applications tend to be quite different from those used in other fields.

The course covers the following broad areas:
  • Working practically with text data, including fundamental tasks such as tokenization and word counting;
  • probabilistic models for text, such as topic models;
  • overview of the most common types of NLP applications;
  • architectures for representation in NLP models, including word embeddings, convolutional and recurrent neural network, and attention models;
  • machine learning models for common types of NLP problems, mainly categorization, sequence labeling, structured prediction and generation;
  • approaches to transfer learning in NLP.

Organisation

Lectures and computer labs

Literature

Course literature to be announced the latest 8 weeks prior to the start of the course. 

Examination including compulsory elements

The course is examined by mandatory written assignments submitted as written reports, as well as a self-defined project that requires the submission of a written report and an oral presentation. Some of the assignments will be carried out individually and others in groups of normally 2-4 students. The project is conducted by 2-4 students.
A late submission of the assignments or project results in the grade Fail (U), unless special reasons exist. A failed assignment or project will be given the opportunity to submit a new solution on subsequent occasions the course is given.
A passing grade for the entire course requires at least a passing grade for all assignments and the project.
To be awarded a higher passing grade for the entire course, the student must, in addition, have a higher average on the weighted grades on the assignments and the project.

The course syllabus contains changes

  • Changes to course rounds:
    • 2020-08-18: Examinator Examinator Richard Johansson (richajo) added by Viceprefekt
      [Course round 1]
    • 2020-05-11: Block Block A added by Schemagruppen
      [Course round 1]
    • 2020-04-22: MIN_PART MIN_PART 10 added by PA
      [Course round 1]
    • 2020-04-22: Max number of participants Max number of participants changed from 10 to 50 by PA
      [Course round 1]
  • Changes to course:
    • 2020-08-18: Prerequisites Prerequisites changed by UBS/Examinator
      Updated course code; TDA233 instead of TDA231