Self-supervised learning in medicine and healthcare

R Krishnan, P Rajpurkar, EJ Topol - Nature Biomedical Engineering, 2022 - nature.com
The development of medical applications of machine learning has required manual
annotation of data, often by medical experts. Yet, the availability of large-scale unannotated …

Genome assemblies of 11 bamboo species highlight diversification induced by dynamic subgenome dominance

PF Ma, YL Liu, C Guo, G **, ZH Guo, L Mao… - Nature Genetics, 2024 - nature.com
Polyploidy (genome duplication) is a pivotal force in evolution. However, the interactions
between parental genomes in a polyploid nucleus, frequently involving subgenome …

MeShClust v3. 0: high-quality clustering of DNA sequences using the mean shift algorithm and alignment-free identity scores

HZ Girgis - BMC genomics, 2022 - Springer
Background Tools for accurately clustering biological sequences are among the most
important tools in computational biology. Two pioneering tools for clustering sequences are …

Alignment-free sequence comparison: A systematic survey from a machine learning perspective

KS Bohnsack, M Kaden, J Abel… - IEEE/ACM Transactions …, 2022 - ieeexplore.ieee.org
The encounter of large amounts of biological sequence data generated during the last
decades and the algorithmic and hardware improvements have offered the possibility to …

Haplotype-resolved nonaploid genome provides insights into in vitro flowering in bamboos

YJ Wang, C Guo, L Zhao, L Mao, XZ Hu… - Horticulture …, 2024 - academic.oup.com
Woody bamboos (Bambusoideae) are renowned for its polyploidy and rare flowering.
Bambusa odashimae is one of the bamboo species with the highest chromosome count …

KINN: an alignment-free accurate phylogeny reconstruction method based on inner distance distributions of k-mer pairs in biological sequences

R Tang, Z Yu, J Li - Molecular Phylogenetics and Evolution, 2023 - Elsevier
Alignment-based methods have faced disadvantages in sequence comparison and
phylogeny reconstruction due to their high computational complexity. Alignment-free …

CGRclust: Chaos Game Representation for twin contrastive clustering of unlabelled DNA sequences

F Alipour, KA Hill, L Kari - BMC genomics, 2024 - Springer
Traditional supervised learning methods applied to DNA sequence taxonomic classification
rely on the labor-intensive and time-consuming step of labelling the primary DNA …

Quantifying Bone Collagen Fingerprint Variation Between Species

A Baker, M Buckley - Molecular Ecology Resources, 2025 - Wiley Online Library
Collagen is the most ubiquitous protein in the animal kingdom and one of the most abundant
proteins on Earth. Despite having a relatively repetitive amino acid sequence motif that …

Mottle: Accurate pairwise substitution distance at high divergence through the exploitation of short-read mappers and gradient descent

A Prusokiene, N Boonham, A Fox, TP Howard - Plos one, 2024 - journals.plos.org
Current tools for estimating the substitution distance between two related sequences
struggle to remain accurate at a high divergence. Difficulties at distant homologies, such as …

Look4LTRs: a Long terminal repeat retrotransposon detection tool capable of cross species studies and discovering recently nested repeats

AB Garza, E Lerat, HZ Girgis - Mobile DNA, 2024 - Springer
Plant genomes include large numbers of transposable elements. One particular type of
these elements is flanked by two Long Terminal Repeats (LTRs) and can translocate using …