Scientific large language models: A survey on biological & chemical domains

Q Zhang, K Ding, T Lv, X Wang, Q Yin, Y Zhang… - ACM Computing …, 2024 - dl.acm.org
Large Language Models (LLMs) have emerged as a transformative power in enhancing
natural language comprehension, representing a significant stride toward artificial general …

Opportunities and challenges for machine learning-assisted enzyme engineering

J Yang, FZ Li, FH Arnold - ACS Central Science, 2024 - ACS Publications
Enzymes can be engineered at the level of their amino acid sequences to optimize key
properties such as expression, stability, substrate range, and catalytic efficiency─ or even to …

Proteinnpt: Improving protein property prediction and design with non-parametric transformers

P Notin, R Weitzman, D Marks… - Advances in Neural …, 2023 - proceedings.neurips.cc
Protein design holds immense potential for optimizing naturally occurring proteins, with
broad applications in drug discovery, material design, and sustainability. However …

Computational scoring and experimental evaluation of enzymes generated by neural networks

SR Johnson, X Fu, S Viknander, C Goldin… - Nature …, 2024 - nature.com
In recent years, generative protein sequence models have been developed to sample novel
sequences. However, predicting whether generated proteins will fold and function remains …

OpenProteinSet: Training data for structural biology at scale

G Ahdritz, N Bouatta, S Kadyan… - Advances in …, 2024 - proceedings.neurips.cc
Multiple sequence alignments (MSAs) of proteins encode rich biological information and
have been workhorses in bioinformatic methods for tasks like protein design and protein …

A new age in protein design empowered by deep learning

H Khakzad, I Igashov, A Schneuing, C Goverde… - Cell Systems, 2023 - cell.com
The rapid progress in the field of deep learning has had a significant impact on protein
design. Deep learning methods have recently produced a breakthrough in protein structure …

[HTML][HTML] Are protein language models the new universal key?

K Weissenow, B Rost - Current Opinion in Structural Biology, 2025 - Elsevier
Protein language models (pLMs) capture some aspects of the grammar of the language of
life as written in protein sequences. The so-called pLM embeddings implicitly contain this …

Machine learning in biological physics: From biomolecular prediction to design

J Martin, M Lequerica Mateos, JN Onuchic… - Proceedings of the …, 2024 - pnas.org
Machine learning has been proposed as an alternative to theoretical modeling when
dealing with complex problems in biological physics. However, in this perspective, we argue …

Context-aware geometric deep learning for protein sequence design

LF Krapp, FA Meireles, LA Abriata, J Devillard… - Nature …, 2024 - nature.com
Protein design and engineering are evolving at an unprecedented pace leveraging the
advances in deep learning. Current models nonetheless cannot natively consider non …

Latent generative landscapes as maps of functional diversity in protein sequence space

C Ziegler, J Martin, C Sinner, F Morcos - Nature Communications, 2023 - nature.com
Variational autoencoders are unsupervised learning models with generative capabilities,
when applied to protein data, they classify sequences by phylogeny and generate de novo …