Obtaining genetics insights from deep learning via explainable artificial intelligence

G Novakovsky, N Dexter, MW Libbrecht… - Nature Reviews …, 2023 - nature.com
Artificial intelligence (AI) models based on deep learning now represent the state of the art
for making functional predictions in genomics research. However, the underlying basis on …

Scientific large language models: A survey on biological & chemical domains

Q Zhang, K Ding, T Lv, X Wang, Q Yin, Y Zhang… - ACM Computing …, 2024 - dl.acm.org
Large Language Models (LLMs) have emerged as a transformative power in enhancing
natural language comprehension, representing a significant stride toward artificial general …

Hyenadna: Long-range genomic sequence modeling at single nucleotide resolution

E Nguyen, M Poli, M Faizi, A Thomas… - Advances in neural …, 2024 - proceedings.neurips.cc
Genomic (DNA) sequences encode an enormous amount of information for gene regulation
and protein synthesis. Similar to natural language models, researchers have proposed …

Nucleotide Transformer: building and evaluating robust foundation models for human genomics

H Dalla-Torre, L Gonzalez, J Mendoza-Revilla… - Nature …, 2024 - nature.com
The prediction of molecular phenotypes from DNA sequences remains a longstanding
challenge in genomics, often driven by limited annotated data and the inability to transfer …

Sequence modeling and design from molecular to genome scale with Evo

E Nguyen, M Poli, MG Durrant, B Kang, D Katrekar… - Science, 2024 - science.org
The genome is a sequence that encodes the DNA, RNA, and proteins that orchestrate an
organism's function. We present Evo, a long-context genomic foundation model with a …

SBSM-Pro: support bio-sequence machine for proteins

Y Wang, Y Zhai, Y Ding, Q Zou - Science China Information Sciences, 2024 - Springer
Proteins play a pivotal role in biological systems. The use of machine learning algorithms for
protein classification can assist and even guide biological experiments, offering crucial …

On the opportunities and risks of foundation models

R Bommasani, DA Hudson, E Adeli, R Altman… - arxiv preprint arxiv …, 2021 - arxiv.org
AI is undergoing a paradigm shift with the rise of models (eg, BERT, DALL-E, GPT-3) that are
trained on broad data at scale and are adaptable to a wide range of downstream tasks. We …

Effective gene expression prediction from sequence by integrating long-range interactions

Ž Avsec, V Agarwal, D Visentin, JR Ledsam… - Nature …, 2021 - nature.com
How noncoding DNA determines gene expression in different cell types is a major unsolved
problem, and critical downstream applications in human genetics depend on improved …

Exploring the limits of out-of-distribution detection

S Fort, J Ren… - Advances in Neural …, 2021 - proceedings.neurips.cc
Near out-of-distribution detection (OOD) is a major challenge for deep neural networks. We
demonstrate that large-scale pre-trained transformers can significantly improve the state-of …

DeepBIO: an automated and interpretable deep-learning platform for high-throughput biological sequence prediction, functional annotation and visualization analysis

R Wang, Y Jiang, J **, C Yin, H Yu… - Nucleic acids …, 2023 - academic.oup.com
Here, we present DeepBIO, the first-of-its-kind automated and interpretable deep-learning
platform for high-throughput biological sequence functional analysis. DeepBIO is a one-stop …