Course syllabus adopted 2021-02-26 by Head of Programme (or corresponding).
Overview
- Swedish nameStatistik för stora datamängder
- CodeMVE441
- Credits7.5 Credits
- OwnerMPENM
- Education cycleSecond-cycle
- Main field of studyMathematics
- DepartmentMATHEMATICAL SCIENCES
- GradingTH - Pass with distinction (5), Pass with credit (4), Pass (3), Fail
Course round 1
- Teaching language English
- Application code 20136
- Open for exchange studentsYes
Credit distribution
Module | Sp1 | Sp2 | Sp3 | Sp4 | Summer | Not Sp | Examination dates |
---|---|---|---|---|---|---|---|
0120 Project 1.5 c Grading: UG | 1.5 c | ||||||
0220 Take-home examination 6 c Grading: TH | 6 c |
In programmes
- MPCAS - COMPLEX ADAPTIVE SYSTEMS, MSC PROGR, Year 1 (compulsory elective)
- MPDSC - DATA SCIENCE AND AI, MSC PROGR, Year 1 (compulsory elective)
- MPENM - ENGINEERING MATHEMATICS AND COMPUTATIONAL SCIENCE, MSC PROGR, Year 1 (compulsory elective)
Examiner
- Rebecka Jörnsten
- Full Professor, Applied Mathematics and Statistics, Mathematical Sciences
Eligibility
General entry requirements for Master's level (second cycle)Applicants enrolled in a programme at Chalmers where the course is included in the study programme are exempted from fulfilling the requirements above.
Specific entry requirements
English 6 (or by other approved means with the equivalent proficiency level)Applicants enrolled in a programme at Chalmers where the course is included in the study programme are exempted from fulfilling the requirements above.
Course specific prerequisites
The prerequisites for the course are a basic course in statistical inference and MVE190 Linear Statistical Models. Students can also contact the course instructor for permission to take the course.Aim
The course should give understanding of and training in techniques for statistical analysis of large data sets.Learning outcomes (after completion of the course the student should be able to)
- demonstrate understanding of the key concepts and ideas concerning classification, clustering and dimension reduction.
- solve high-dimensional data analysis exercises and interpret the results of such analyses.
Content
Overview of high-dimensional data analysis
- Classification: Bayes rule, discriminant analysis methods, nearest neighbor classifier, classification and regression trees.
- Cost functions, greedy searches, gradient descent, cross-validation.
- Logistic regression
- Regularization methods. Sparse logistic regression, sparse discriminant analysis.
- Ensemble methods: bagging, random projections, random forests.
- Clustering: k-means, hierarchical clustering, model-based clustering, spectral methods.
- Dimension reduction: PCA, canonical correlation, multi-dimensional scaling.
- Special topics (subset of the following): networks and graphical models, sparse covariance estimation, network clustering and community detection, neural networks, matrix completion, collaborative filtering.
- Large-scale learning: stochastic searches, batch-methods, online learning.
Organisation
The teaching is organized with lectures, discussions, and reading assignments.
Literature
To be announced.Examination including compulsory elements
Oral and/or written examination.
The course examiner may assess individual students in other ways than what is stated above if there are special reasons for doing so, for example if a student has a decision from Chalmers on educational support due to disability.