Guiding questions to avoid data leakage in biological machine learning applications

J Bernett, DB Blumenthal, DG Grimm, F Haselbeck… - Nature …, 2024 - nature.com
Abstract Machine learning methods for extracting patterns from high-dimensional data are
very important in the biological sciences. However, in certain cases, real-world applications …

TEMPRO: nanobody melting temperature estimation model using protein embeddings

JAE Alvarez, SN Dean - Scientific Reports, 2024 - nature.com
Single-domain antibodies (sdAbs) or nanobodies have received widespread attention due
to their small size (~ 15 kDa) and diverse applications in bio-derived therapeutics. As many …

TemStaPro: protein thermostability prediction using sequence representations from protein language models

I Pudžiuvelytė, K Olechnovič, E Godliauskaite… - …, 2024 - academic.oup.com
Motivation Reliable prediction of protein thermostability from its sequence is valuable for
both academic and industrial research. This prediction problem can be tackled using …

[HTML][HTML] Transitioning from wet lab to artificial intelligence: a systematic review of AI predictors in CRISPR

AF Abbasi, MN Asim, A Dengel - Journal of Translational …, 2025 - pmc.ncbi.nlm.nih.gov
The revolutionary CRISPR-Cas9 system leverages a programmable guide RNA (gRNA) and
Cas9 proteins to precisely cleave problematic regions within DNA sequences. This …

TemBERTure: advancing protein thermostability prediction with deep learning and attention mechanisms

C Rodella, S Lazaridi, T Lemmin - Bioinformatics advances, 2024 - academic.oup.com
Motivation Understanding protein thermostability is essential for numerous biotechnological
applications, but traditional experimental methods are time-consuming, expensive, and error …

PTSP-BERT: Predict the thermal stability of proteins using sequence-based bidirectional representations from transformer-embedded features

Z Lv, M Wei, H Pei, S Peng, M Li, L Jiang - Computers in Biology and …, 2025 - Elsevier
Thermophilic proteins, mesophiles proteins and psychrophilic proteins have wide industrial
applications, as enzymes with different optimal temperatures are often needed for different …

ThermoFinder: A sequence-based thermophilic proteins prediction framework

H Yu, X Luo - International Journal of Biological Macromolecules, 2024 - Elsevier
Thermophilic proteins are important for academic research and industrial processes, and
various computational methods have been developed to identify and screen them. However …

Classifying alkaliphilic proteins using embeddings from protein language model

M Susanty, MKN Mursalim, R Hertadi… - Computers in Biology …, 2024 - Elsevier
Alkaliphilic proteins have great potential as biocatalysts in biotechnology, especially for
enzyme engineering. Extensive research has focused on exploring the enzymatic potential …

RNA sequence analysis landscape: A comprehensive review of task types, databases, datasets, word embedding methods, and language models

MN Asim, MA Ibrahim, T Asif, A Dengel - Heliyon, 2025 - cell.com
Deciphering information of RNA sequences reveals their diverse roles in living organisms,
including gene regulation and protein synthesis. Aberrations in RNA sequence such as …

Leveraging protein language model embeddings and logistic regression for efficient and accurate in-silico acidophilic proteins classification

M Susanty, MKN Mursalim, R Hertadi… - … Biology and Chemistry, 2024 - Elsevier
The increasing demand for eco-friendly technologies in biotechnology necessitates effective
and sustainable catalysts. Acidophilic proteins, functioning optimally in highly acidic …