Course syllabus for Statistical learning for big data

Course syllabus adopted 2021-02-26 by Head of Programme (or corresponding).

Overview

  • Swedish nameStatistik för stora datamängder
  • CodeMVE441
  • Credits7.5 Credits
  • OwnerMPENM
  • Education cycleSecond-cycle
  • Main field of studyMathematics
  • DepartmentMATHEMATICAL SCIENCES
  • GradingTH - Pass with distinction (5), Pass with credit (4), Pass (3), Fail

Course round 1

  • Teaching language English
  • Application code 20112
  • Open for exchange studentsYes

Credit distribution

0120 Project 1.5 c
Grading: UG
0 c0 c0 c1.5 c0 c0 c
0220 Take-home examination 6 c
Grading: TH
0 c0 c0 c6 c0 c0 c

In programmes

Examiner

Eligibility

General entry requirements for Master's level (second cycle)
Applicants enrolled in a programme at Chalmers where the course is included in the study programme are exempted from fulfilling the requirements above.

Specific entry requirements

English 6 (or by other approved means with the equivalent proficiency level)
Applicants enrolled in a programme at Chalmers where the course is included in the study programme are exempted from fulfilling the requirements above.

Course specific prerequisites

The prerequisites for the course are a basic course in statistical inference and MVE190 Linear Statistical Models. Students can also contact the course instructor for permission to take the course.

Aim

The course should give understanding of and training in techniques for statistical analysis of large data sets.

Learning outcomes (after completion of the course the student should be able to)

  • demonstrate understanding of the key concepts and ideas concerning classification, clustering and dimension reduction.
  • solve high-dimensional data analysis exercises and interpret the results of such analyses.


Content

  • Overview of high-dimensional data analysis

  • Classification: Bayes rule, discriminant analysis methods, nearest neighbor classifier, classification and regression trees.
  • Cost functions, greedy searches, gradient descent, cross-validation.
  • Logistic regression
  • Regularization methods. Sparse logistic regression, sparse discriminant analysis.
  • Ensemble methods: bagging, random projections, random forests.
  • Clustering: k-means, hierarchical clustering, model-based clustering, spectral methods.
  • Dimension reduction: PCA, canonical correlation, multi-dimensional scaling.
  • Special topics (subset of the following): networks and graphical models, sparse covariance estimation, network clustering and community detection, neural networks, matrix completion, collaborative filtering.
  • Large-scale learning: stochastic searches, batch-methods, online learning.


Organisation

The teaching is organized with lectures, discussions, and reading assignments.

Literature

To be announced.

Examination including compulsory elements

Oral and/or written examination.


The course examiner may assess individual students in other ways than what is stated above if there are special reasons for doing so, for example if a student has a decision from Chalmers on educational support due to disability.