A guide to machine learning for biologists

JG Greener, SM Kandathil, L Moffat… - Nature reviews Molecular …, 2022 - nature.com
The expanding scale and inherent complexity of biological data have encouraged a growing
use of machine learning in biology to build informative and predictive models of the …

[HTML][HTML] The language of proteins: NLP, machine learning & protein sequences

D Ofer, N Brandes, M Linial - Computational and Structural Biotechnology …, 2021 - Elsevier
Natural language processing (NLP) is a field of computer science concerned with automated
text and language analysis. In recent years, following a series of breakthroughs in deep and …

Evolutionary-scale prediction of atomic-level protein structure with a language model

Z Lin, H Akin, R Rao, B Hie, Z Zhu, W Lu, N Smetanin… - Science, 2023 - science.org
Recent advances in machine learning have leveraged evolutionary information in multiple
sequence alignments to predict protein structure. We demonstrate direct inference of full …

Robust deep learning–based protein sequence design using ProteinMPNN

J Dauparas, I Anishchenko, N Bennett, H Bai… - Science, 2022 - science.org
Although deep learning has revolutionized protein structure prediction, almost all
experimentally characterized de novo protein designs have been generated using …

ColabFold: making protein folding accessible to all

M Mirdita, K Schütze, Y Moriwaki, L Heo… - Nature …, 2022 - nature.com
ColabFold offers accelerated prediction of protein structures and complexes by combining
the fast homology search of MMseqs2 with AlphaFold2 or RoseTTAFold. ColabFold's 40− 60 …

[HTML][HTML] Highly accurate protein structure prediction with AlphaFold

J Jumper, R Evans, A Pritzel, T Green, M Figurnov… - nature, 2021 - nature.com
Proteins are essential to life, and understanding their structure can facilitate a mechanistic
understanding of their function. Through an enormous experimental effort 1, 2, 3, 4, the …

BUSCO update: novel and streamlined workflows along with broader and deeper phylogenetic coverage for scoring of eukaryotic, prokaryotic, and viral genomes

M Manni, MR Berkeley, M Seppey… - Molecular biology …, 2021 - academic.oup.com
Methods for evaluating the quality of genomic and metagenomic data are essential to aid
genome assembly procedures and to correctly interpret the results of subsequent analyses …

Sensitive protein alignments at tree-of-life scale using DIAMOND

B Buchfink, K Reuter, HG Drost - Nature methods, 2021 - nature.com
We are at the beginning of a genomic revolution in which all known species are planned to
be sequenced. Accessing such data for comparative analyses is crucial in this new age of …

Fast and accurate protein structure search with Foldseek

M Van Kempen, SS Kim, C Tumescheit, M Mirdita… - Nature …, 2024 - nature.com
As structure prediction methods are generating millions of publicly available protein
structures, searching these databases is becoming a bottleneck. Foldseek aligns the …

Identification of mobile genetic elements with geNomad

AP Camargo, S Roux, F Schulz, M Babinski, Y Xu… - Nature …, 2024 - nature.com
Identifying and characterizing mobile genetic elements in sequencing data is essential for
understanding their diversity, ecology, biotechnological applications and impact on public …