Conway–Bromage–Lyndon (CBL): an exact, dynamic representation of k-mer sets
In this article, we introduce the Conway–Bromage–Lyndon (CBL) structure, a compressed,
dynamic and exact method for representing k-mer sets. Originating from Conway and …
dynamic and exact method for representing k-mer sets. Originating from Conway and …
Function-Assigned Masked Superstrings as a Versatile and Compact Data Type for k-Mer Sets
The exponential growth of genome databases calls for novel space-efficient algorithms for
data compression and search. State-of-the-art approaches often rely on k-merization for data …
data compression and search. State-of-the-art approaches often rely on k-merization for data …
FroM Superstring to Indexing: a space-efficient index for unconstrained k-mer sets using the Masked Burrows-Wheeler Transform (MBWT)
The exponential growth of DNA sequencing data calls for efficient solutions for storing and
querying large-scale k-mer sets. While recent indexing approaches use spectrum …
querying large-scale k-mer sets. While recent indexing approaches use spectrum …
Advances in practical k-mer sets: essentials for the curious
C Marchet - arxiv preprint arxiv:2409.05210, 2024 - arxiv.org
This paper provides a comprehensive survey of data structures for representing k-mer sets,
which are fundamental in high-throughput sequencing analysis. It categorizes the methods …
which are fundamental in high-throughput sequencing analysis. It categorizes the methods …
[KNJIGA][B] Compression Algorithms for De Bruijn Graph and Hidden Assembly Artifacts
A Rahman - 2023 - search.proquest.com
In this dissertation, I present four projects covering two main research objectives. The first
objective of my dissertation is to optimize storage usage of sequence analysis tools and …
objective of my dissertation is to optimize storage usage of sequence analysis tools and …
[HTML][HTML] Unitigs are not enough: the advantages of superunitig-based algorithms in bioinformatics
S Schmidt - 2023 - helda.helsinki.fi
Unitigs are a central construct in many subfields of bioinformatics, including genome
assembly and the compact representation of k-mer spectra. In both of these subfields, using …
assembly and the compact representation of k-mer spectra. In both of these subfields, using …
Approximation guarantees for shortest superstrings: simpler and better
Abstract The Shortest Superstring problem is an NP-hard problem, in which given as input a
set of strings, we are looking for a string of minimum length that contains all input strings as …
set of strings, we are looking for a string of minimum length that contains all input strings as …
Masked superstrings for efficient k-mer set representation and indexing
O Sladký - 2024 - dspace.cuni.cz
The exponential growth of genomic data calls for novel space-efficient algorithms for
compression and search. State-of-the-art approaches often rely on tokenization of the data …
compression and search. State-of-the-art approaches often rely on tokenization of the data …
Brisk: Exact resource-efficient dictionary for k-mers
The rapid advancements in DNA sequencing technology have led to an unprecedented
increase in the generation of genomic datasets, with modern sequencers now capable of …
increase in the generation of genomic datasets, with modern sequencers now capable of …
[PDF][PDF] Masked superstrings as a compact, indexable and dynamic representation of k-mer sets
The exponential growth of DNA sequencing data calls for efficient approaches for their
compression and search [1, 2]. Modern bioinformatics increasingly uses k-merization as a …
compression and search [1, 2]. Modern bioinformatics increasingly uses k-merization as a …