Conway–Bromage–Lyndon (CBL): an exact, dynamic representation of k-mer sets

I Martayan, B Cazaux, A Limasset, C Marchet - Bioinformatics, 2024 - academic.oup.com
In this article, we introduce the Conway–Bromage–Lyndon (CBL) structure, a compressed,
dynamic and exact method for representing k-mer sets. Originating from Conway and …

Function-Assigned Masked Superstrings as a Versatile and Compact Data Type for k-Mer Sets

O Sladký, P Veselý, K Břinda - bioRxiv, 2024 - biorxiv.org
The exponential growth of genome databases calls for novel space-efficient algorithms for
data compression and search. State-of-the-art approaches often rely on k-merization for data …

FroM Superstring to Indexing: a space-efficient index for unconstrained k-mer sets using the Masked Burrows-Wheeler Transform (MBWT)

O Sladký, P Veselý, K Břinda - bioRxiv, 2024 - biorxiv.org
The exponential growth of DNA sequencing data calls for efficient solutions for storing and
querying large-scale k-mer sets. While recent indexing approaches use spectrum …

Advances in practical k-mer sets: essentials for the curious

C Marchet - arxiv preprint arxiv:2409.05210, 2024 - arxiv.org
This paper provides a comprehensive survey of data structures for representing k-mer sets,
which are fundamental in high-throughput sequencing analysis. It categorizes the methods …

[KNJIGA][B] Compression Algorithms for De Bruijn Graph and Hidden Assembly Artifacts

A Rahman - 2023 - search.proquest.com
In this dissertation, I present four projects covering two main research objectives. The first
objective of my dissertation is to optimize storage usage of sequence analysis tools and …

[HTML][HTML] Unitigs are not enough: the advantages of superunitig-based algorithms in bioinformatics

S Schmidt - 2023 - helda.helsinki.fi
Unitigs are a central construct in many subfields of bioinformatics, including genome
assembly and the compact representation of k-mer spectra. In both of these subfields, using …

Approximation guarantees for shortest superstrings: simpler and better

M Englert, N Matsakis, P Veselý - 34th International Symposium …, 2023 - drops.dagstuhl.de
Abstract The Shortest Superstring problem is an NP-hard problem, in which given as input a
set of strings, we are looking for a string of minimum length that contains all input strings as …

Masked superstrings for efficient k-mer set representation and indexing

O Sladký - 2024 - dspace.cuni.cz
The exponential growth of genomic data calls for novel space-efficient algorithms for
compression and search. State-of-the-art approaches often rely on tokenization of the data …

Brisk: Exact resource-efficient dictionary for k-mers

C Smith, I Martayan, A Limasset, Y Dufresne - bioRxiv, 2024 - biorxiv.org
The rapid advancements in DNA sequencing technology have led to an unprecedented
increase in the generation of genomic datasets, with modern sequencers now capable of …

[PDF][PDF] Masked superstrings as a compact, indexable and dynamic representation of k-mer sets

O Sladký, P Veselý, K Břinda - SeqBim 2024-Journées sur les …, 2024 - inria.hal.science
The exponential growth of DNA sequencing data calls for efficient approaches for their
compression and search [1, 2]. Modern bioinformatics increasingly uses k-merization as a …