Challenges and applications of large language models

J Kaddour, J Harris, M Mozes, H Bradley… - arxiv preprint arxiv …, 2023 - arxiv.org
Large Language Models (LLMs) went from non-existent to ubiquitous in the machine
learning discourse within a few years. Due to the fast pace of the field, it is difficult to identify …

Scientific large language models: A survey on biological & chemical domains

Q Zhang, K Ding, T Lv, X Wang, Q Yin, Y Zhang… - ACM Computing …, 2024 - dl.acm.org
Large Language Models (LLMs) have emerged as a transformative power in enhancing
natural language comprehension, representing a significant stride toward artificial general …

De novo design of protein structure and function with RFdiffusion

JL Watson, D Juergens, NR Bennett, BL Trippe, J Yim… - Nature, 2023 - nature.com
There has been considerable recent progress in designing new proteins using deep-
learning methods,,,,,,,–. Despite this progress, a general deep-learning framework for protein …

Bloomberggpt: A large language model for finance

S Wu, O Irsoy, S Lu, V Dabravolski, M Dredze… - arxiv preprint arxiv …, 2023 - arxiv.org
The use of NLP in the realm of financial technology is broad and complex, with applications
ranging from sentiment analysis and named entity recognition to question answering. Large …

Fast and accurate protein structure search with Foldseek

M Van Kempen, SS Kim, C Tumescheit, M Mirdita… - Nature …, 2024 - nature.com
As structure prediction methods are generating millions of publicly available protein
structures, searching these databases is becoming a bottleneck. Foldseek aligns the …

Galactica: A large language model for science

R Taylor, M Kardas, G Cucurull, T Scialom… - arxiv preprint arxiv …, 2022 - arxiv.org
Information overload is a major obstacle to scientific progress. The explosive growth in
scientific literature and data has made it ever harder to discover useful insights in a large …

Hyenadna: Long-range genomic sequence modeling at single nucleotide resolution

E Nguyen, M Poli, M Faizi, A Thomas… - Advances in neural …, 2024 - proceedings.neurips.cc
Genomic (DNA) sequences encode an enormous amount of information for gene regulation
and protein synthesis. Similar to natural language models, researchers have proposed …

OpenFold: Retraining AlphaFold2 yields new insights into its learning mechanisms and capacity for generalization

G Ahdritz, N Bouatta, C Floristean, S Kadyan, Q **a… - Nature …, 2024 - nature.com
AlphaFold2 revolutionized structural biology with the ability to predict protein structures with
exceptionally high accuracy. Its implementation, however, lacks the code and data required …

Automated model building and protein identification in cryo-EM maps

K Jamali, L Käll, R Zhang, A Brown, D Kimanius… - Nature, 2024 - nature.com
Interpreting electron cryo-microscopy (cryo-EM) maps with atomic models requires high
levels of expertise and labour-intensive manual intervention in three-dimensional computer …

Foldseek: fast and accurate protein structure search

M van Kempen, SS Kim, C Tumescheit, M Mirdita… - Biorxiv, 2022 - biorxiv.org
Highly accurate structure prediction methods are generating an avalanche of publicly
available protein structures. Searching through these structures is becoming the main …