Computational techniques for large-scale data

The course syllabus contains changes

Course syllabus adopted 2021-02-17 by Head of Programme (or corresponding).

Overview

Swedish nameBeräkningsmetoder för storskaliga data
CodeDAT470
Credits7.5 Credits
OwnerMPDSC
Education cycleSecond-cycle
Main field of studyComputer Science and Engineering, Software Engineering
DepartmentCOMPUTER SCIENCE AND ENGINEERING
GradingTH - Pass with distinction (5), Pass with credit (4), Pass (3), Fail

Course round 1

Teaching language English
Application code 87117
Maximum participants100 (at least 10% of the seats are reserved for exchange students)
Block schedule
Open for exchange studentsYes

Credit distribution

Module	Sp1	Sp2	Sp3	Sp4	Summer	Not Sp	Examination dates
0121 Written and oral assignments 4.5 c Grading: UG				4.5 c
0221 Examination 3 c Grading: TH				3 c			31 Maj 2025 am J 11 Okt 2024 am J 21 Aug 2025 pm J

In programmes

Examiner

Matti Karppa
Senior Lecturer, Data Science and AI, Computer Science and Engineering
Contact
- karppa@chalmers.se
- To personal page

Go to coursepage

Eligibility

General entry requirements for Master's level (second cycle)
Applicants enrolled in a programme at Chalmers where the course is included in the study programme are exempted from fulfilling the requirements above.

Specific entry requirements

English 6 (or by other approved means with the equivalent proficiency level)
Applicants enrolled in a programme at Chalmers where the course is included in the study programme are exempted from fulfilling the requirements above.

Course specific prerequisites

To be eligible to the course, the student should have a Bachelor's degree in any subject, or have successfully completed 90 credits of studies in computer science, software engineering, or equivalent. Specifically, at least 15 credits of successfully completed courses in programming, including at least 7.5 credits in Python programming, or equivalent are required. The student needs to have successfully completed a course in probability theory or statistics, for example MVE051, TMS137 or similar.

The course cannot be included in a degree which contains DAT345 or DAT346. Neither can the course be included in a degree which is based on another degree in which the course DAT345 or DAT346 is included.

Aim

The advent of big-data has led to the development of new programming paradigms, in particular for parallel systems allowing the computation with big data on redundant clusters of commodity computers. This course provides an introduction to different programming paradigms, e.g. MapReduce and extensions, which facilitate computations with Terabytes of data. It also demonstrates that for specific tasks algorithms and data structures can provide highly efficient alternatives.

Learning outcomes (after completion of the course the student should be able to)

After completion of the course the student should be able to:

Learning objectives

discuss important technological aspects when designing and implementing analysis solutions for large-scale data,
explain differences between parallel programming models
describe data structures and algorithms for big data and discuss their utility

Skills and abilities

implement applications for transforming and analyzing large-scale data with different parallel software frameworks,
use algorithms and datastructures for computations with large-scale data

Judgement ability and approach

suggest appropriate computational infrastructures and methodological approaches for analysis tasks and discuss their advantages and drawbacks,
discuss advantages and drawbacks of different strategies of parallelization,
decide between algorithmic and parallelization-based approaches for accelerating computational workloads

Content

The aim of this course is to deepen the students knowledge and skills and familiarize them with the technical and technological side of data science, including software respectively hardware environments. The course will introduce aspects of designing and implementing large-scale data science solutions.

In particular, the course will include:

an overview of computer architectures, algorithmic approaches, and high- performance computing infrastructures with a focus on limitations for processing large-scale data,
an introduction to relevant frameworks for cluster computing with large-scale data,
implementation of data analysis tools on a cluster using Python and appropriate software frameworks,
data structures and algorithms, such as index structures, which can greatly accelerate computations with large-scale data

Organisation

Lectures, computer lab sessions, and exercise sessions.

Literature

Course literature to be announced the latest 8 weeks prior to the start of the course.

Examination including compulsory elements

The course is examined by a written hall examination, as well as mandatory written assignments, some of which will be carried out individually and others will be carried out in groups of normally 2-4 students.
There will be non-obligatory individual assignments which grant bonus points for the written examination. These bonus points are valid for the two next scheduled re- examinations.

The course examiner may assess individual students in other ways than what is stated above if there are special reasons for doing so, for example if a student has a decision from Chalmers about disability study support.

The course syllabus contains changes

Changes to examination:
- 2025-02-17: Examination time Examination time changed from Afternoon to Morning by Matti Karppa
  [2025-05-31 3,0 hec, 0221]

Course syllabus for Computational techniques for large-scale data

Overview

Course round 1

Credit distribution

In programmes

Examiner

Eligibility

Specific entry requirements

Course specific prerequisites

Aim

Learning outcomes (after completion of the course the student should be able to)

Content

Organisation

Literature

Examination including compulsory elements

The course syllabus contains changes

Overview