Scientific large language models: A survey on biological & chemical domains

Q Zhang, K Ding, T Lv, X Wang, Q Yin, Y Zhang… - ACM Computing …, 2024 - dl.acm.org
Large Language Models (LLMs) have emerged as a transformative power in enhancing
natural language comprehension, representing a significant stride toward artificial general …

Machine learning-guided protein engineering

P Kouba, P Kohout, F Haddadi, A Bushuiev… - ACS …, 2023 - ACS Publications
Recent progress in engineering highly promising biocatalysts has increasingly involved
machine learning methods. These methods leverage existing experimental and simulation …

Simulating 500 million years of evolution with a language model

T Hayes, R Rao, H Akin, NJ Sofroniew, D Oktay, Z Lin… - Science, 2025 - science.org
More than three billion years of evolution have produced an image of biology encoded into
the space of natural proteins. Here we show that language models trained at scale on …

Evolutionary-scale prediction of atomic-level protein structure with a language model

Z Lin, H Akin, R Rao, B Hie, Z Zhu, W Lu, N Smetanin… - Science, 2023 - science.org
Recent advances in machine learning have leveraged evolutionary information in multiple
sequence alignments to predict protein structure. We demonstrate direct inference of full …

ProtGPT2 is a deep unsupervised language model for protein design

N Ferruz, S Schmidt, B Höcker - Nature communications, 2022 - nature.com
Protein design aims to build novel proteins customized for specific purposes, thereby
holding the potential to tackle many environmental and biomedical problems. Recent …

Sequence modeling and design from molecular to genome scale with Evo

E Nguyen, M Poli, MG Durrant, B Kang, D Katrekar… - Science, 2024 - science.org
The genome is a sequence that encodes the DNA, RNA, and proteins that orchestrate an
organism's function. We present Evo, a long-context genomic foundation model with a …

Genome-wide prediction of disease variant effects with a deep protein language model

N Brandes, G Goldman, CH Wang, CJ Ye, V Ntranos - Nature Genetics, 2023 - nature.com
Predicting the effects of coding variants is a major challenge. While recent deep-learning
models have improved variant effect prediction accuracy, they cannot analyze all coding …

Efficient evolution of human antibodies from general protein language models

BL Hie, VR Shanker, D Xu, TUJ Bruun… - Nature …, 2024 - nature.com
Natural evolution must explore a vast landscape of possible sequences for desirable yet
rare mutations, suggesting that learning from natural evolutionary strategies could guide …

Proteingym: Large-scale benchmarks for protein fitness prediction and design

P Notin, A Kollasch, D Ritter… - Advances in …, 2024 - proceedings.neurips.cc
Predicting the effects of mutations in proteins is critical to many applications, from
understanding genetic disease to designing novel proteins to address our most pressing …

Protst: Multi-modality learning of protein sequences and biomedical texts

M Xu, X Yuan, S Miret, J Tang - International Conference on …, 2023 - proceedings.mlr.press
Current protein language models (PLMs) learn protein representations mainly based on
their sequences, thereby well capturing co-evolutionary information, but they are unable to …