A guide to machine learning for biologists
The expanding scale and inherent complexity of biological data have encouraged a growing
use of machine learning in biology to build informative and predictive models of the …
use of machine learning in biology to build informative and predictive models of the …
[HTML][HTML] The language of proteins: NLP, machine learning & protein sequences
Natural language processing (NLP) is a field of computer science concerned with automated
text and language analysis. In recent years, following a series of breakthroughs in deep and …
text and language analysis. In recent years, following a series of breakthroughs in deep and …
Evolutionary-scale prediction of atomic-level protein structure with a language model
Recent advances in machine learning have leveraged evolutionary information in multiple
sequence alignments to predict protein structure. We demonstrate direct inference of full …
sequence alignments to predict protein structure. We demonstrate direct inference of full …
Robust deep learning–based protein sequence design using ProteinMPNN
Although deep learning has revolutionized protein structure prediction, almost all
experimentally characterized de novo protein designs have been generated using …
experimentally characterized de novo protein designs have been generated using …
ColabFold: making protein folding accessible to all
ColabFold offers accelerated prediction of protein structures and complexes by combining
the fast homology search of MMseqs2 with AlphaFold2 or RoseTTAFold. ColabFold's 40− 60 …
the fast homology search of MMseqs2 with AlphaFold2 or RoseTTAFold. ColabFold's 40− 60 …
[HTML][HTML] Highly accurate protein structure prediction with AlphaFold
Proteins are essential to life, and understanding their structure can facilitate a mechanistic
understanding of their function. Through an enormous experimental effort 1, 2, 3, 4, the …
understanding of their function. Through an enormous experimental effort 1, 2, 3, 4, the …
BUSCO update: novel and streamlined workflows along with broader and deeper phylogenetic coverage for scoring of eukaryotic, prokaryotic, and viral genomes
Methods for evaluating the quality of genomic and metagenomic data are essential to aid
genome assembly procedures and to correctly interpret the results of subsequent analyses …
genome assembly procedures and to correctly interpret the results of subsequent analyses …
Sensitive protein alignments at tree-of-life scale using DIAMOND
We are at the beginning of a genomic revolution in which all known species are planned to
be sequenced. Accessing such data for comparative analyses is crucial in this new age of …
be sequenced. Accessing such data for comparative analyses is crucial in this new age of …
Fast and accurate protein structure search with Foldseek
As structure prediction methods are generating millions of publicly available protein
structures, searching these databases is becoming a bottleneck. Foldseek aligns the …
structures, searching these databases is becoming a bottleneck. Foldseek aligns the …
Identification of mobile genetic elements with geNomad
Identifying and characterizing mobile genetic elements in sequencing data is essential for
understanding their diversity, ecology, biotechnological applications and impact on public …
understanding their diversity, ecology, biotechnological applications and impact on public …