Statistical learning for big data

Course syllabus adopted 2021-02-26 by Head of Programme (or corresponding).

Overview

Swedish nameStatistik för stora datamängder
CodeMVE441
Credits7.5 Credits
OwnerMPENM
Education cycleSecond-cycle
Main field of studyMathematics
DepartmentMATHEMATICAL SCIENCES
GradingTH - Pass with distinction (5), Pass with credit (4), Pass (3), Fail

Course round 1

Teaching language English
Application code 20112
Open for exchange studentsYes

Credit distribution

Module	Sp1	Sp2	Sp3	Sp4	Summer	Not Sp	Examination dates
0120 Project 1.5 c Grading: UG				1.5 c
0220 Take-home examination 6 c Grading: TH				6 c

In programmes

Examiner

Rebecka Jörnsten
Full Professor, Applied Mathematics and Statistics, Mathematical Sciences
Contact

Go to coursepage

Eligibility

General entry requirements for Master's level (second cycle)
Applicants enrolled in a programme at Chalmers where the course is included in the study programme are exempted from fulfilling the requirements above.

Specific entry requirements

English 6 (or by other approved means with the equivalent proficiency level)
Applicants enrolled in a programme at Chalmers where the course is included in the study programme are exempted from fulfilling the requirements above.

Course specific prerequisites

The prerequisites for the course are a basic course in statistical inference and MVE190 Linear Statistical Models. Students can also contact the course instructor for permission to take the course.

Aim

The course should give understanding of and training in techniques for statistical analysis of large data sets.

Learning outcomes (after completion of the course the student should be able to)

demonstrate understanding of the key concepts and ideas concerning classification, clustering and dimension reduction.
solve high-dimensional data analysis exercises and interpret the results of such analyses.

Content

Overview of high-dimensional data analysis
Classification: Bayes rule, discriminant analysis methods, nearest neighbor classifier, classification and regression trees.
Cost functions, greedy searches, gradient descent, cross-validation.
Logistic regression
Regularization methods. Sparse logistic regression, sparse discriminant analysis.
Ensemble methods: bagging, random projections, random forests.
Clustering: k-means, hierarchical clustering, model-based clustering, spectral methods.
Dimension reduction: PCA, canonical correlation, multi-dimensional scaling.
Special topics (subset of the following): networks and graphical models, sparse covariance estimation, network clustering and community detection, neural networks, matrix completion, collaborative filtering.
Large-scale learning: stochastic searches, batch-methods, online learning.

Organisation

The teaching is organized with lectures, discussions, and reading assignments.

Literature

To be announced.

Examination including compulsory elements

Oral and/or written examination.

The course examiner may assess individual students in other ways than what is stated above if there are special reasons for doing so, for example if a student has a decision from Chalmers about disability study support.

Course syllabus for Statistical learning for big data

Overview

Course round 1

Credit distribution

In programmes

Examiner

Eligibility

Specific entry requirements

Course specific prerequisites

Aim

Learning outcomes (after completion of the course the student should be able to)

Content

Organisation

Literature

Examination including compulsory elements

Overview