Proteingym: Large-scale benchmarks for protein fitness prediction and design
Predicting the effects of mutations in proteins is critical to many applications, from
understanding genetic disease to designing novel proteins to address our most pressing …
understanding genetic disease to designing novel proteins to address our most pressing …
Evaluating generalizability of artificial intelligence models for molecular datasets
Deep learning has made rapid advances in modelling molecular sequencing data. Despite
achieving high performance on benchmarks, it remains unclear to what extent deep learning …
achieving high performance on benchmarks, it remains unclear to what extent deep learning …
Peer: a comprehensive and multi-task benchmark for protein sequence understanding
We are now witnessing significant progress of deep learning methods in a variety of tasks
(or datasets) of proteins. However, there is a lack of a standard benchmark to evaluate the …
(or datasets) of proteins. However, there is a lack of a standard benchmark to evaluate the …
[HTML][HTML] Proteingym: Large-scale benchmarks for protein design and fitness prediction
Predicting the effects of mutations in proteins is critical to many applications, from
understanding genetic disease to designing novel proteins that can address our most …
understanding genetic disease to designing novel proteins that can address our most …
Evaluating representation learning on the protein structure universe
We introduce ProteinWorkshop, a comprehensive benchmark suite for representation
learning on protein structures with Geometric Graph Neural Networks. We consider large …
learning on protein structures with Geometric Graph Neural Networks. We consider large …
Ten quick tips for sequence-based prediction of protein properties using machine learning
The ubiquitous availability of genome sequencing data explains the popularity of machine
learning-based methods for the prediction of protein properties from their amino acid …
learning-based methods for the prediction of protein properties from their amino acid …
PETA: evaluating the impact of protein transfer learning with sub-word tokenization on downstream applications
Protein language models (PLMs) play a dominant role in protein representation learning.
Most existing PLMs regard proteins as sequences of 20 natural amino acids. The problem …
Most existing PLMs regard proteins as sequences of 20 natural amino acids. The problem …
Protein engineering in the deep learning era
Advances in deep learning have significantly aided protein engineering in addressing
challenges in industrial production, healthcare, and environmental sustainability. This …
challenges in industrial production, healthcare, and environmental sustainability. This …
Contrasting Sequence with Structure: Pre-training Graph Representations with PLMs
Understanding protein function is vital for drug discovery, disease diagnosis, and protein
engineering. While Protein Language Models (PLMs) pre-trained on vast protein sequence …
engineering. While Protein Language Models (PLMs) pre-trained on vast protein sequence …
Collectively encoding protein properties enriches protein language models
J An, X Weng - BMC bioinformatics, 2022 - Springer
Pre-trained natural language processing models on a large natural language corpus can
naturally transfer learned knowledge to protein domains by fine-tuning specific in-domain …
naturally transfer learned knowledge to protein domains by fine-tuning specific in-domain …