Seminar
The event has passed

Predicting interaction partners and generating new protein sequences using protein language models

AI for Science seminar with Anne-Florence Bitbol, EPFL.

Overview

The event has passed
  • Date:Starts 10 October 2024, 15:00Ends 10 October 2024, 16:30
  • Seats available:40
  • Location:
    EDIT Analysen or online
  • Language:English

Zoom password: ai4science

Profile picture for Anne-Florence Bitbol, EPFL.
Dr. Anne-Florence Bitbol, EPFL.

The on-site event will be followed by fika in the Analysen coffee area (fika from 16:00-16:30).

"Predicting interaction partners and generating new protein sequences using protein language models"

Talk abstract:

Protein sequences are shaped by functional optimization on the one hand and by evolutionary history, i.e. phylogeny, on the other hand. A multiple sequence alignment of homologous proteins contains sequences which evolved from the same ancestral sequence and have similar structure and function. In such an alignment, statistical patterns in amino-acid usage at different sites encode structural and functional constraints.


Protein language models trained on multiple sequence alignments capture coevolution between sites and structural contacts, but also phylogenetic relationships. I will discuss a method we recently proposed that leverages these models to predict which proteins interact among the paralogs of two protein families, and improves the prediction of the structure of some protein complexes. Next, I will show that these models can be used to generate new protein sequences from given protein families.

While multiple sequence alignments are very useful, their construction is imperfect. To address these limitations, we developed ProtMamba, a homology-aware but alignment-free protein language model based on the Mamba architecture, which efficiently uses long contexts. I will show that ProtMamba has promising generative properties, and is able to predict fitness.

About the speaker:

Anne-Florence Bitbol is an Assistant Professor at the Swiss Federal Institute of Technology in Lausanne (EPFL), where she leads the Laboratory of  computational Biology and Theoretical Biophysics, within the Institute of Bioengineering, also affiliated to the Swiss Institute of Bioinformatics. She studied physics at ENS Lyon, and obtained her PhD in 2012 at Université Paris-Diderot, advised by Jean-Baptiste Fournier. She then joined the Princeton Biophysics Theory group led by William Bialek, Curtis Callan and Ned Wingreen, as an HFSP Postdoctoral Fellow. In 2016 she became an independent CNRS researcher at Laboratoire Jean Perrin of Sorbonne Université in Paris, before joining EPFL in 2020.

Anne-Florence is broadly interested in understanding biological phenomena through physical concepts and mathematical and computational tools. She investigates the impacts of optimization and historical contingency in biological systems, from the molecular to the population scales. She studies how the protein sequence-function relationship is affected by phylogeny and physical constraints, and she develops inference methods from protein sequences, e.g. to predict protein-protein interactions from sequences. These methods are based on information theory, statistical physics and machine learning. She also assesses how microbial population evolution is impacted by spatial structure and environment changes, with applications to antibiotic resistance evolution, and to the evolution of bacteria in the gut. She currently holds an ERC Starting Grant.

Structured learning

This theme focuses on how to make use of structure in data to build machine learning (ML) and artificial intelligence (AI) systems which are safer, more trustworthy and generalize better. Structure includes the relationship between data, in time and space, and how the predictions change when data is transformed in specific ways, for example rotated or scaled. These topics are abstract and general but have a direct impact on the use of AI and ML in the sciences and in applications such as drugs and materials design, or medical imaging.