Course syllabus for Computational techniques for large-scale data

Course syllabus adopted 2021-02-17 by Head of Programme (or corresponding).

Overview

  • Swedish nameBeräkningsmetoder för storskaliga data
  • CodeDAT470
  • Credits7.5 Credits
  • OwnerMPDSC
  • Education cycleSecond-cycle
  • Main field of studyComputer Science and Engineering, Software Engineering
  • DepartmentCOMPUTER SCIENCE AND ENGINEERING
  • GradingTH - Pass with distinction (5), Pass with credit (4), Pass (3), Fail

Course round 1

  • Teaching language English
  • Application code 87117
  • Maximum participants100 (at least 10% of the seats are reserved for exchange students)
  • Block schedule
  • Open for exchange studentsYes

Credit distribution

0121 Written and oral assignments 4.5 c
Grading: UG
4.5 c
0221 Examination 3 c
Grading: TH
3 c
  • 11 Okt 2024 am J

In programmes

Examiner

Go to coursepage (Opens in new tab)

Eligibility

General entry requirements for Master's level (second cycle)
Applicants enrolled in a programme at Chalmers where the course is included in the study programme are exempted from fulfilling the requirements above.

Specific entry requirements

English 6 (or by other approved means with the equivalent proficiency level)
Applicants enrolled in a programme at Chalmers where the course is included in the study programme are exempted from fulfilling the requirements above.

Course specific prerequisites

To be eligible to the course, the student should have a Bachelor's degree in any subject, or have successfully completed 90 credits of studies in computer science, software engineering, or equivalent. Specifically, at least 15 credits of successfully completed courses in programming, including at least 7.5 credits in Python programming, or equivalent are required. The student needs to have successfully completed a course in probability theory or statistics, for example MVE051, TMS137 or similar.

The course cannot be included in a degree which contains DAT345 or DAT346. Neither can the course be included in a degree which is based on another degree in which the course DAT345 or DAT346 is included.

Aim

The advent of big-data has led to the development of new programming paradigms, in particular for parallel systems allowing the computation with big data on redundant clusters of commodity computers. This course provides an introduction to different programming paradigms, e.g. MapReduce and extensions, which facilitate computations with Terabytes of data. It also demonstrates that for specific tasks algorithms and data structures can provide highly efficient alternatives.   

Learning outcomes (after completion of the course the student should be able to)

After completion of the course the student should be able to:

Learning objectives
  • discuss important technological aspects when designing and implementing analysis solutions for large-scale data,
  • explain differences between parallel programming models
  • describe data structures and algorithms for big data and discuss their utility

Skills and abilities
  • implement applications for transforming and analyzing large-scale data with different parallel software frameworks,
  • use algorithms and datastructures for computations with large-scale data

Judgement ability and approach
  • suggest appropriate computational infrastructures and methodological approaches for analysis tasks and discuss their advantages and drawbacks,
  • discuss advantages and drawbacks of different strategies of parallelization,
  • decide between algorithmic and parallelization-based approaches for accelerating computational workloads


Content

The aim of this course is to deepen the students’ knowledge and skills and familiarize them with the technical and technological side of data science, including software respectively hardware environments. The course will introduce aspects of designing and implementing large-scale data science solutions.

In particular, the course will include:
  • an overview of computer architectures, algorithmic approaches, and high- performance computing infrastructures with a focus on limitations for processing large-scale data,
  • an introduction to relevant frameworks for cluster computing with large-scale data,
  • implementation of data analysis tools on a cluster using Python and appropriate software frameworks,
  • data structures and algorithms, such as index structures, which can greatly accelerate computations with large-scale data

Organisation

Lectures, computer lab sessions, and exercise sessions.

Literature

Course literature to be announced the latest 8 weeks prior to the start of the course. 

Examination including compulsory elements

The course is examined by a written hall examination, as well as mandatory written assignments, some of which will be carried out individually and others will be carried out in groups of normally 2-4 students.
There will be non-obligatory individual assignments which grant bonus points for the written examination. These bonus points are valid for the two next scheduled re- examinations.

The course examiner may assess individual students in other ways than what is stated above if there are special reasons for doing so, for example if a student has a decision from Chalmers on educational support due to disability.