Learning the regulatory code of gene expression

J Zrimec, F Buric, M Kokina, V Garcia… - Frontiers in Molecular …, 2021 - frontiersin.org
Data-driven machine learning is the method of choice for predicting molecular phenotypes
from nucleotide sequence, modeling gene expression events including protein-DNA …

Deep learning-based codon optimization with large-scale synonymous variant datasets enables generalized tunable protein expression

DA Constant, JM Gutierrez, AV Sastry, R Viazzo… - BioRxiv, 2023 - biorxiv.org
Increasing recombinant protein expression is of broad interest in industrial biotechnology,
synthetic biology, and basic research. Codon optimization is an important step in …

Predicting gene sequences with AI to study codon usage patterns

T Sidi, S Bahiri-Elitzur, T Tuller, R Kolodny - Proceedings of the National …, 2025 - pnas.org
Selective pressure acts on the codon use, optimizing multiple, overlap** signals that are
only partially understood. We trained AI models to predict codons given their amino acid …

CodonTransformer: a multispecies codon optimizer using context-aware neural networks

A Fallahpour, V Gureghian, GJ Filion, AB Lindner… - bioRxiv, 2024 - biorxiv.org
The genetic code is degenerate allowing a multitude of possible DNA sequences to encode
the same protein. This degeneracy impacts the efficiency of heterologous protein production …

Pre-trained protein language model for codon optimization

S Pathak, G Lin - bioRxiv, 2024 - biorxiv.org
Abstract Motivation Codon optimization of Open Reading Frame (ORF) sequences is
essential for enhancing mRNA stability and expression in applications like mRNA vaccines …

Natural Language Processing for Language of Life (mRNA vaccine design)

S Pathak - 2024 - era.library.ualberta.ca
The COVID-19 pandemic accelerated the development of mRNA vaccines, yet identifying
the optimal mRNA sequence for human use, particularly for the SARS-CoV-2 spike protein …