Computational graph pangenomics: a tutorial on data structures and their applications
Computational pangenomics is an emerging research field that is changing the way
computer scientists are facing challenges in biological sequence analysis. In past decades …
computer scientists are facing challenges in biological sequence analysis. In past decades …
MONI: a pangenomic index for finding maximal exact matches
Recently, Gagie et al. proposed a version of the FM-index, called the r-index, that can store
thousands of human genomes on a commodity computer. Then Kuhnle et al. showed how to …
thousands of human genomes on a commodity computer. Then Kuhnle et al. showed how to …
A survey of BWT variants for string collections
D Cenzato, Z Lipták - Bioinformatics, 2024 - academic.oup.com
Motivation In recent years, the focus of bioinformatics research has moved from individual
sequences to collections of sequences. Given the fundamental role of the Burrows-Wheeler …
sequences to collections of sequences. Given the fundamental role of the Burrows-Wheeler …
An upper bound and linear-space queries on the LZ-End parsing
Lempel–Ziv (LZ77) compression is the most commonly used lossless compression
algorithm. The basic idea is to greedily break the input string into blocks (called “phrases”) …
algorithm. The basic idea is to greedily break the input string into blocks (called “phrases”) …
Computing MEMs and Relatives on Repetitive Text Collections
G Navarro - arxiv preprint arxiv:2210.09914, 2022 - arxiv.org
We consider the problem of computing the Maximal Exact Matches (MEMs) of a given
pattern $ P [1.. m] $ on a large repetitive text collection $ T [1.. n] $, which is represented as a …
pattern $ P [1.. m] $ on a large repetitive text collection $ T [1.. n] $, which is represented as a …
A survey of BWT variants for string collections
D Cenzato, Z Lipták - arxiv preprint arxiv:2202.13235, 2022 - arxiv.org
In recent years, the focus of bioinformatics research has moved from individual sequences to
collections of sequences. Given the fundamental role of the Burrows-Wheeler Transform …
collections of sequences. Given the fundamental role of the Burrows-Wheeler Transform …
Computing the original eBWT faster, simpler, and with less memory
Mantaci et al. TCS 2007 defined the eBWT eBWT to extend the definition of the BWT BWT to
a collection of strings. However, since this introduction, it has been used more generally to …
a collection of strings. However, since this introduction, it has been used more generally to …
A fast and small subsampled r-index
The $ r $-index (Gagie et al., JACM 2020) represented a breakthrough in compressed
indexing of repetitive text collections, outperforming its alternatives by orders of magnitude …
indexing of repetitive text collections, outperforming its alternatives by orders of magnitude …
Breaking the 𝒪(n)-Barrier in the Construction of Compressed Suffix Arrays and Suffix Trees
The suffix array, describing the lexicographical order of suffixes of a given text, and the suffix
tree, a path-compressed trie of all suffixes, are the two most fundamental data structures for …
tree, a path-compressed trie of all suffixes, are the two most fundamental data structures for …
LZ77 via prefix-free parsing
In this paper, we present an algorithm for constructing the Lempel-Ziv 77 (LZ77) factorization
using prefix-free parsing, an algorithm that was first developed as preprocessing algorithm …
using prefix-free parsing, an algorithm that was first developed as preprocessing algorithm …