When less is more: sketching with minimizers in genomics

M Ndiaye, S Prieto-Baños, LM Fitzgerald… - Genome biology, 2024 - Springer
The exponential increase in sequencing data calls for conceptual and computational
advances to extract useful biological insights. One such advance, minimizers, allows for …

A survey of map** algorithms in the long-reads era

K Sahlin, T Baudeau, B Cazaux, C Marchet - Genome Biology, 2023 - Springer
It has been over a decade since the first publication of a method dedicated entirely to
map** long-reads. The distinctive characteristics of long reads resulted in methods …

Recombination between heterologous human acrocentric chromosomes

A Guarracino, S Buonaiuto, LG de Lima, T Potapova… - Nature, 2023 - nature.com
The short arms of the human acrocentric chromosomes 13, 14, 15, 21 and 22 (SAACs) share
large homologous regions, including ribosomal DNA repeats and extended segmental …

Fast and robust metagenomic sequence comparison through sparse chaining with skani

J Shaw, YW Yu - Nature Methods, 2023 - nature.com
Sequence comparison tools for metagenome-assembled genomes (MAGs) struggle with
high-volume or low-quality data. We present skani (https://github. com/bluenote-1577/skani) …

Minmers are a generalization of minimizers that enable unbiased local Jaccard estimation

B Kille, E Garrison, TJ Treangen, AM Phillippy - Bioinformatics, 2023 - academic.oup.com
Abstract Motivation The Jaccard similarity on k-mer sets has shown to be a convenient proxy
for sequence identity. By avoiding expensive base-level alignments and comparing reduced …

Creating and using minimizer sketches in computational genomics

H Zheng, G Marçais, C Kingsford - Journal of Computational …, 2023 - liebertpub.com
Processing large data sets has become an essential part of computational genomics.
Greatly increased availability of sequence data from multiple sources has fueled …

An Efficient Parallel Sketch-based Algorithmic Workflow for Map** Long Reads

T Rahman, O Bhowmik… - IEEE/ACM Transactions …, 2024 - ieeexplore.ieee.org
Long read technologies are continuing to evolve at a rapid pace, with the latest of the high
fidelity technologies delivering reads over 10Kbp with high accuracy (99.9%). Classical long …

ESKEMAP: exact sketch-based read map**

T Schulz, P Medvedev - Algorithms for Molecular Biology, 2024 - Springer
Background Given a sequencing read, the broad goal of read map** is to find the location
(s) in the reference genome that have a “similar sequence”. Traditionally,“similar sequence” …

Efficient reconciliation of genomic datasets of high similarity

Y Shibuya, D Belazzougui, G Kucherov - bioRxiv, 2022 - biorxiv.org
Abstract We apply Invertible Bloom Lookup Tables (IBLTs) to comparison of k-mer sets
originated from large DNA sequence datasets. We show that for similar datasets, IBLTs …

[HTML][HTML] Exact Sketch-Based Read Map**

T Schulz, P Medvedev - LIPIcs: Leibniz international proceedings in …, 2023 - ncbi.nlm.nih.gov
Given a sequencing read, the broad goal of read map** is to find the location (s) in the
reference genome that have a “similar sequence”. Traditionally,“similar sequence” was …