Whole genome sequencing in clinical practice

FO Bagger, L Borgwardt, AS Jespersen… - BMC medical …, 2024 - Springer
Whole genome sequencing (WGS) is becoming the preferred method for molecular genetic
diagnosis of rare and unknown diseases and for identification of actionable cancer drivers …

Hyenadna: Long-range genomic sequence modeling at single nucleotide resolution

E Nguyen, M Poli, M Faizi, A Thomas… - Advances in neural …, 2023 - proceedings.neurips.cc
Genomic (DNA) sequences encode an enormous amount of information for gene regulation
and protein synthesis. Similar to natural language models, researchers have proposed …

Nucleotide Transformer: building and evaluating robust foundation models for human genomics

H Dalla-Torre, L Gonzalez, J Mendoza-Revilla… - Nature …, 2024 - nature.com
The prediction of molecular phenotypes from DNA sequences remains a longstanding
challenge in genomics, often driven by limited annotated data and the inability to transfer …

Transcriptional and post-transcriptional controls for tuning gene expression in plants

V Zhong, BN Archibald, JAN Brophy - Current Opinion in Plant Biology, 2023 - Elsevier
Plant biotechnologists seek to modify plants through genetic reprogramming, but our ability
to precisely control gene expression in plants is still limited. Here, we review transcription …

Caduceus: Bi-directional equivariant long-range dna sequence modeling

Y Schiff, CH Kao, A Gokaslan, T Dao, A Gu… - arxiv preprint arxiv …, 2024 - arxiv.org
Large-scale sequence modeling has sparked rapid advances that now extend into biology
and genomics. However, modeling genomic sequences introduces challenges such as the …

Self-supervised learning on millions of primary RNA sequences from 72 vertebrates improves sequence-based RNA splicing prediction

K Chen, Y Zhou, M Ding, Y Wang, Z Ren… - Briefings in …, 2024 - academic.oup.com
Abstract Language models pretrained by self-supervised learning (SSL) have been widely
utilized to study protein sequences, while few models were developed for genomic …

A foundational large language model for edible plant genomes

J Mendoza-Revilla, E Trop, L Gonzalez… - Communications …, 2024 - nature.com
Significant progress has been made in the field of plant genomics, as demonstrated by the
increased use of high-throughput methodologies that enable the characterization of multiple …

A de novo ARIH2 gene mutation was detected in a patient with autism spectrum disorders and intellectual disability

M Vinci, S Treccarichi, R Galati Rando, A Musumeci… - Scientific Reports, 2024 - nature.com
E3 ubiquitin protein ligase encoded by ARIH2 gene catalyses the ubiquitination of target
proteins and plays a crucial role in posttranslational modifications across various cellular …

Self-supervised learning on millions of pre-mRNA sequences improves sequence-based RNA splicing prediction

K Chen, Y Zhou, M Ding, Y Wang, Z Ren, Y Yang - BioRxiv, 2023 - biorxiv.org
ABSTRACT RNA splicing is an important post-transcriptional process of gene expression in
eukaryotic organisms. Here, we developed a novel language model, SpliceBERT, pre …

Unraveling the chicken T cell repertoire with enhanced genome annotation

SP Früh, MA Früh, BB Kaufer, TW Göbel - Frontiers in Immunology, 2024 - frontiersin.org
T cell receptor (TCR) repertoire sequencing has emerged as a powerful tool for
understanding the diversity and functionality of T cells within the host immune system. Yet …