A guide to machine learning for biologists

JG Greener, SM Kandathil, L Moffat… - Nature reviews Molecular …, 2022 - nature.com
The expanding scale and inherent complexity of biological data have encouraged a growing
use of machine learning in biology to build informative and predictive models of the …

[HTML][HTML] Machine learning in protein structure prediction

M AlQuraishi - Current opinion in chemical biology, 2021 - Elsevier
Prediction of protein structure from sequence has been intensely studied for many decades,
owing to the problem's importance and its uniquely well-defined physical and computational …

Evolutionary-scale prediction of atomic-level protein structure with a language model

Z Lin, H Akin, R Rao, B Hie, Z Zhu, W Lu, N Smetanin… - Science, 2023 - science.org
Recent advances in machine learning have leveraged evolutionary information in multiple
sequence alignments to predict protein structure. We demonstrate direct inference of full …

On the opportunities and risks of foundation models

R Bommasani, DA Hudson, E Adeli, R Altman… - arxiv preprint arxiv …, 2021 - arxiv.org
AI is undergoing a paradigm shift with the rise of models (eg, BERT, DALL-E, GPT-3) that are
trained on broad data at scale and are adaptable to a wide range of downstream tasks. We …

Accurate prediction of protein structures and interactions using a three-track neural network

M Baek, F DiMaio, I Anishchenko, J Dauparas… - Science, 2021 - science.org
DeepMind presented notably accurate predictions at the recent 14th Critical Assessment of
Structure Prediction (CASP14) conference. We explored network architectures that …

Prottrans: Toward understanding the language of life through self-supervised learning

A Elnaggar, M Heinzinger, C Dallago… - IEEE transactions on …, 2021 - ieeexplore.ieee.org
Computational biology and bioinformatics provide vast data gold-mines from protein
sequences, ideal for Language Models (LMs) taken from Natural Language Processing …

Folding non-homologous proteins by coupling deep-learning contact maps with I-TASSER assembly simulations

W Zheng, C Zhang, Y Li, R Pearce, EW Bell… - Cell reports methods, 2021 - cell.com
Structure prediction for proteins lacking homologous templates in the Protein Data Bank
(PDB) remains a significant unsolved problem. We developed a protocol, CI-TASSER, to …

Wilds: A benchmark of in-the-wild distribution shifts

PW Koh, S Sagawa, H Marklund… - International …, 2021 - proceedings.mlr.press
Distribution shifts—where the training distribution differs from the test distribution—can
substantially degrade the accuracy of machine learning (ML) systems deployed in the wild …

MSA transformer

RM Rao, J Liu, R Verkuil, J Meier… - International …, 2021 - proceedings.mlr.press
Unsupervised protein language models trained across millions of diverse sequences learn
structure and function of proteins. Protein language models studied to date have been …

Learning the protein language: Evolution, structure, and function

T Bepler, B Berger - Cell systems, 2021 - cell.com
Language models have recently emerged as a powerful machine-learning approach for
distilling information from massive protein sequence databases. From readily available …