When less is more: sketching with minimizers in genomics
The exponential increase in sequencing data calls for conceptual and computational
advances to extract useful biological insights. One such advance, minimizers, allows for …
advances to extract useful biological insights. One such advance, minimizers, allows for …
A survey of map** algorithms in the long-reads era
It has been over a decade since the first publication of a method dedicated entirely to
map** long-reads. The distinctive characteristics of long reads resulted in methods …
map** long-reads. The distinctive characteristics of long reads resulted in methods …
Recombination between heterologous human acrocentric chromosomes
The short arms of the human acrocentric chromosomes 13, 14, 15, 21 and 22 (SAACs) share
large homologous regions, including ribosomal DNA repeats and extended segmental …
large homologous regions, including ribosomal DNA repeats and extended segmental …
Fast and robust metagenomic sequence comparison through sparse chaining with skani
Sequence comparison tools for metagenome-assembled genomes (MAGs) struggle with
high-volume or low-quality data. We present skani (https://github. com/bluenote-1577/skani) …
high-volume or low-quality data. We present skani (https://github. com/bluenote-1577/skani) …
Minmers are a generalization of minimizers that enable unbiased local Jaccard estimation
Abstract Motivation The Jaccard similarity on k-mer sets has shown to be a convenient proxy
for sequence identity. By avoiding expensive base-level alignments and comparing reduced …
for sequence identity. By avoiding expensive base-level alignments and comparing reduced …
Creating and using minimizer sketches in computational genomics
Processing large data sets has become an essential part of computational genomics.
Greatly increased availability of sequence data from multiple sources has fueled …
Greatly increased availability of sequence data from multiple sources has fueled …
An Efficient Parallel Sketch-based Algorithmic Workflow for Map** Long Reads
Long read technologies are continuing to evolve at a rapid pace, with the latest of the high
fidelity technologies delivering reads over 10Kbp with high accuracy (99.9%). Classical long …
fidelity technologies delivering reads over 10Kbp with high accuracy (99.9%). Classical long …
ESKEMAP: exact sketch-based read map**
T Schulz, P Medvedev - Algorithms for Molecular Biology, 2024 - Springer
Background Given a sequencing read, the broad goal of read map** is to find the location
(s) in the reference genome that have a “similar sequence”. Traditionally,“similar sequence” …
(s) in the reference genome that have a “similar sequence”. Traditionally,“similar sequence” …
Efficient reconciliation of genomic datasets of high similarity
Abstract We apply Invertible Bloom Lookup Tables (IBLTs) to comparison of k-mer sets
originated from large DNA sequence datasets. We show that for similar datasets, IBLTs …
originated from large DNA sequence datasets. We show that for similar datasets, IBLTs …
[HTML][HTML] Exact Sketch-Based Read Map**
T Schulz, P Medvedev - LIPIcs: Leibniz international proceedings in …, 2023 - ncbi.nlm.nih.gov
Given a sequencing read, the broad goal of read map** is to find the location (s) in the
reference genome that have a “similar sequence”. Traditionally,“similar sequence” was …
reference genome that have a “similar sequence”. Traditionally,“similar sequence” was …