Novel machine learning approaches revolutionize protein knowledge

N Bordin, C Dallago, M Heinzinger, S Kim… - Trends in Biochemical …, 2023 - cell.com
Breakthrough methods in machine learning (ML), protein structure prediction, and novel
ultrafast structural aligners are revolutionizing structural biology. Obtaining accurate models …

Representation learning applications in biological sequence analysis

H Iuchi, T Matsutani, K Yamada, N Iwano… - Computational and …, 2021 - Elsevier
Although remarkable advances have been reported in high-throughput sequencing, the
ability to aptly analyze a substantial amount of rapidly generated biological …

Nucleotide Transformer: building and evaluating robust foundation models for human genomics

H Dalla-Torre, L Gonzalez, J Mendoza-Revilla… - Nature …, 2024 - nature.com
The prediction of molecular phenotypes from DNA sequences remains a longstanding
challenge in genomics, often driven by limited annotated data and the inability to transfer …

<? sty\usepackage {wasysym}?> Bilingual language model for protein sequence and structure

M Heinzinger, K Weissenow… - NAR Genomics and …, 2024 - academic.oup.com
Adapting language models to protein sequences spawned the development of powerful
protein language models (pLMs). Concurrently, AlphaFold2 broke through in protein …

Exploring the limits of out-of-distribution detection

S Fort, J Ren… - Advances in Neural …, 2021 - proceedings.neurips.cc
Near out-of-distribution detection (OOD) is a major challenge for deep neural networks. We
demonstrate that large-scale pre-trained transformers can significantly improve the state-of …

Prottrans: Toward understanding the language of life through self-supervised learning

A Elnaggar, M Heinzinger, C Dallago… - IEEE transactions on …, 2021 - ieeexplore.ieee.org
Computational biology and bioinformatics provide vast data gold-mines from protein
sequences, ideal for Language Models (LMs) taken from Natural Language Processing …

Learning functional properties of proteins with language models

S Unsal, H Atas, M Albayrak, K Turhan… - Nature Machine …, 2022 - nature.com
Data-centric approaches have been used to develop predictive methods for elucidating
uncharacterized properties of proteins; however, studies indicate that these methods should …

PredictProtein-predicting protein structure and function for 29 years

M Bernhofer, C Dallago, T Karl… - Nucleic acids …, 2021 - academic.oup.com
Abstract Since 1992 PredictProtein (https://predictprotein. org) is a one-stop online resource
for protein sequence analysis with its main site hosted at the Luxembourg Centre for …

Protein generation with evolutionary diffusion: sequence is all you need

S Alamdari, N Thakkar, R van den Berg, N Tenenholtz… - BioRxiv, 2023 - biorxiv.org
Deep generative models are increasingly powerful tools for the in silico design of novel
proteins. Recently, a family of generative models called diffusion models has demonstrated …

Rhea, the reaction knowledgebase in 2022

P Bansal, A Morgat, KB Axelsen… - Nucleic acids …, 2022 - academic.oup.com
Abstract Rhea (https://www. rhea-db. org) is an expert-curated knowledgebase of
biochemical reactions based on the chemical ontology ChEBI (Chemical Entities of …