Machine learning for functional protein design

P Notin, N Rollins, Y Gal, C Sander, D Marks - Nature biotechnology, 2024 - nature.com
Recent breakthroughs in AI coupled with the rapid accumulation of protein sequence and
structure data have radically transformed computational protein design. New methods …

Scientific large language models: A survey on biological & chemical domains

Q Zhang, K Ding, T Lv, X Wang, Q Yin, Y Zhang… - ACM Computing …, 2024 - dl.acm.org
Large Language Models (LLMs) have emerged as a transformative power in enhancing
natural language comprehension, representing a significant stride toward artificial general …

Simulating 500 million years of evolution with a language model

T Hayes, R Rao, H Akin, NJ Sofroniew, D Oktay, Z Lin… - Science, 2025 - science.org
More than three billion years of evolution have produced an image of biology encoded into
the space of natural proteins. Here we show that language models trained at scale on …

Sequence modeling and design from molecular to genome scale with Evo

E Nguyen, M Poli, MG Durrant, B Kang, D Katrekar… - Science, 2024 - science.org
The genome is a sequence that encodes the DNA, RNA, and proteins that orchestrate an
organism's function. We present Evo, a long-context genomic foundation model with a …

Artificial intelligence for science in quantum, atomistic, and continuum systems

X Zhang, L Wang, J Helwig, Y Luo, C Fu, Y **e… - arxiv preprint arxiv …, 2023 - arxiv.org
Advances in artificial intelligence (AI) are fueling a new paradigm of discoveries in natural
sciences. Today, AI has started to advance natural sciences by improving, accelerating, and …

Guiding questions to avoid data leakage in biological machine learning applications

J Bernett, DB Blumenthal, DG Grimm, F Haselbeck… - Nature …, 2024 - nature.com
Abstract Machine learning methods for extracting patterns from high-dimensional data are
very important in the biological sciences. However, in certain cases, real-world applications …

xTrimoPGLM: unified 100B-scale pre-trained transformer for deciphering the language of protein

B Chen, X Cheng, P Li, Y Geng, J Gong, S Li… - arxiv preprint arxiv …, 2024 - arxiv.org
Protein language models have shown remarkable success in learning biological information
from protein sequences. However, most existing models are limited by either autoencoding …

Beacon: Benchmark for comprehensive rna tasks and language models

Y Ren, Z Chen, L Qiao, H **g, Y Cai… - Advances in …, 2025 - proceedings.neurips.cc
RNA plays a pivotal role in translating genetic instructions into functional outcomes,
underscoring its importance in biological processes and disease mechanisms. Despite the …

Rapid in silico directed evolution by a protein language model with EVOLVEpro

K Jiang, Z Yan, M Di Bernardo, SR Sgrizzi, L Villiger… - Science, 2024 - science.org
Directed protein evolution is central to biomedical applications but faces challenges like
experimental complexity, inefficient multi-property optimization, and local maxima traps …

A general temperature-guided language model to design proteins of enhanced stability and activity

F Jiang, M Li, J Dong, Y Yu, X Sun, B Wu, J Huang… - Science …, 2024 - science.org
Designing protein mutants with both high stability and activity is a critical yet challenging
task in protein engineering. Here, we introduce PRIME, a deep learning model, which can …