Education e-Commons

Introduction to using digital resources in research

Course information

This course aims to provide PhD students with the tools to make more effective use of digital resources and techniques for facilitating research and research quality. The curriculum will treat tools, tips, and tricks that are industry standards both within and outside research. These topics will be taught by experts at Chalmers e-Commons, Chalmers digital research infrastructure.

Every topic is taught as a separate module and takes a full day. Every module is taught on the introductory level, where the subject is introduced and the student is helped over the first hurdle to familiarize themselves with the topic.

The total number of modules is 12, and 10 modules are required to pass the course. Students can choose between two sets of optional modules, referred to below with either A or B.

If you would like to formally include the course in your graduate study program you need to discuss this with your supervisor.

Planning

This course will take place in weekly modules over the course of the fall of 2024. Every module lasts a full day, and the first one will be taught on Friday 20th of September. The last module will happen on Friday December 6th.

For more information, and signing up, contact Leon Boschman (leon.boschman@chalmers.se).

Modules

This course consists of 12 modules:

Unix shell
Basic python
Structured data analysis
Data visualization
High performance computing
Research data management
Version control & collaboration
Writing readable scripts
Using Python Notebooks for communication (A1)
Digital project management (A2)
Introduction to machine learning & AI (B1)
Ethics of AI (B2)

Unix shell (Bash)

The Unix terminal is a powerful tool used often in computational science. Most supercomputers will only allow for a terminal environment, and no graphical user interface. Having experience in using the terminal is essential for working with HPC facilities. It is also necessary to run slightly more advanced analysis scripts.

At the end, students will know some basic commands in Bash, and how to navigate using a terminal.

Basic python

Python is an all-round programming language widely used in the field of data analysis, scientific computing, and machine learning.

This module aims to teach students the basics of Python and help them set up a Python environment on their own laptop.

They will do this in a beginner-friendly Jupyter Notebook environment.

At the end, they can write and execute a small notebook that uses functions, lists, dictionaries, etc.

Structured data analysis

In this module students will learn how to write a re-usable data analysis in Python. They will learn the basics of pandas, numpy, and scipy to work with tabular and numerical data. Moreover, they will learn the basics of creating a reusable workflow.

Data visualization

Students will be taught different data visualization strategies. They will learn about which visualizations to use, and how to make visualizations accessible to people with colorblindness.

They will also learn how to make visualizations in python using industry standard plotting libraries.

High performance computing

In this module students will learn the difference between computing on their laptop and on a high-performance computing cluster.

They will learn about strategies to make optimal use of HPC clusters, and how they might use HPC in their own research project.

Research data management

Students will learn about the best practices of research data management. This will include an approach to making research data FAIR (findable, accessible, interoperable, reusable). They will also learn about GDPR compliance and data life cycle management.

Version control & collaboration

Students will learn about git version control, which is a decentralized version control system widely used for source code and other plain-text files. Additionally, they will learn about version control in a collaborative setting where multiple researchers work on the same files.

Writing readable code

Here we discuss how students can ensure that the code they write is easily readable, with a clear and easy-to-follow logic. This will help in getting consistent results from data analyses.

Using Python Notebooks for communication (A1)

Notebooks are a great tool to communicate science and science results. In this module we will teach students how to use the interactive capabilities of Jupyter notebook as an effective means of communication.

Digital project management (A2)

In this module the students will learn how to effectively work on digital projects, specifically tailored towards scientific data analysis. We will discuss the use of software versioning, effective collaboration, and ensuring that the project can be taken over by colleagues.

Introduction to machine learning & AI (B1)

We discuss the basics of machine learning and AI, and what different kinds of problems can be solved using these techniques. We will also discuss how these methods could be used in their own fields.

Ethics of AI (B2)

The use of AI, and especially generative AI, comes with a host of ethical dilemmas. We will make students aware of these dilemmas, and discuss how they apply to their own work.

ML and AI in the afternoon

Target group: PhD students + Researchers at Chalmers

Level: Introductory course, Basic level

The course aims to provide the participants (researchers) with an overview of the general concepts and fundamentals of Artificial Intelligence AI and Machine Learning ML. It offers an understanding of the ML terminologies and techniques: supervised learning ( Regression and Classification) and unsupervised learning (Clustering and Association).

The course shows the participants how to apply various methods for ML to solve different types of problems and choose between multiple ML algorithms that fit the application; then, the researchers can build their ML applications and learn the practical aspect of evaluation, such as validation techniques and understanding the metrics.

Furthermore, the purpose is also to understand the different forms of data (images, text, numerical and others) in addition to data preparation. The course is also customised according to the targeted department each time, so the participants can figure out where ML could be applicable in their field.

In addition, the course gives the participants a quick overview of Deep Learning and introduces them to the most well-known networks.

Finally, there will be a coding session where we apply the concepts covered during the course to a real example from the participant’s knowledge.

Registration is open for the Department of Architecture and Civil Engineering
(not later than 15th Oct, 2023): here

Practical Introduction to computer clusters

Target group: Students in courses that will be using Vera or Alvis

Level: Basic (those who can’t follow along in the Vera and Alvis intros would probably benefit from starting with this one before trying again)

This online self-study course introduces concepts that are good to know when working with most computer clusters. Including but not limited to working with the command line in Linux, the SLURM job scheduler and using containers. While the material is specifically tailored for the Vera and Alvis computer clusters, the content is valuable when working with larger computer clusters.

Introduction to Vera

Target group: New users of Vera

Level: Familiarity with computers is expected

A 2-hour seminar introducing the Vera computer cluster and how to work with it. All new users should are expected to participate, and those that are unable to attend should read through the slides.

Image of Alvis, new national large-scale computer/data resource for AI and Machine Learning. Situated in the MC2-building. — Image of Alvis, new national large-scale computer/data resource for AI and Machine Learning.
Photographer: Henrik Sandsjö

Introduction to Alvis

Target group: New users of Alvis

Level: Familiarity with computers is expected

A 2-hour seminar introducing the Alvis computer cluster and how to work with it. In connection, a 2-hour workshop is usually held with exercises for getting started with machine learning on Alvis. All new users should are expected to participate in the seminar, and those that are unable to attend should read through the slides.

Introduction to Research Data Management (RDM)

Target group: PhD students + Researchers [can be adapted]

Level: Beginner

The course’s name and contents can be adapted depending on the target audience. We have a base/standard material where we go through the basics of RDM, which are those listed in the previous agenda I sent you. Those points will always be mentioned during such a course, but the focus will vary depending on the audience.

Computer Vision

This course introduces methods for analyzing and understanding images through a Machine Learning perspective. Computer Vision aims to infer something about the world using observed image data with applications such as Image Classification, Localization, Object Detection and Segmentation.

The course shows examples of dealing with such problems using methods in the field of Deep Neural Networks and the concept of feature learning. We also look at recent advancements in generative models and self-supervised learning.

Visualization

Visualization is an essential part of the data science pipeline, either for presenting and delivering a message to a specific audience or as a tool for exploration to get more understanding of a dataset. The main goal is to help our understanding of the data by utilizing our human ability to find patterns visually.

This course shows various ways of working with the visualization of data through different kinds of tools and visual elements. We further discuss the importance of visual interaction, how to deal with high-dimensional data and show examples of working with geospatial and temporal data.