Computational graph pangenomics: a tutorial on data structures and their applications

JA Baaijens, P Bonizzoni, C Boucher… - Natural Computing, 2022 - Springer
Computational pangenomics is an emerging research field that is changing the way
computer scientists are facing challenges in biological sequence analysis. In past decades …

MONI: a pangenomic index for finding maximal exact matches

M Rossi, M Oliva, B Langmead, T Gagie… - Journal of …, 2022 - liebertpub.com
Recently, Gagie et al. proposed a version of the FM-index, called the r-index, that can store
thousands of human genomes on a commodity computer. Then Kuhnle et al. showed how to …

A survey of BWT variants for string collections

D Cenzato, Z Lipták - Bioinformatics, 2024 - academic.oup.com
Motivation In recent years, the focus of bioinformatics research has moved from individual
sequences to collections of sequences. Given the fundamental role of the Burrows-Wheeler …

An upper bound and linear-space queries on the LZ-End parsing

D Kempa, B Saha - Proceedings of the 2022 Annual ACM-SIAM …, 2022 - SIAM
Lempel–Ziv (LZ77) compression is the most commonly used lossless compression
algorithm. The basic idea is to greedily break the input string into blocks (called “phrases”) …

Computing MEMs and Relatives on Repetitive Text Collections

G Navarro - arxiv preprint arxiv:2210.09914, 2022 - arxiv.org
We consider the problem of computing the Maximal Exact Matches (MEMs) of a given
pattern $ P [1.. m] $ on a large repetitive text collection $ T [1.. n] $, which is represented as a …

A survey of BWT variants for string collections

D Cenzato, Z Lipták - arxiv preprint arxiv:2202.13235, 2022 - arxiv.org
In recent years, the focus of bioinformatics research has moved from individual sequences to
collections of sequences. Given the fundamental role of the Burrows-Wheeler Transform …

Computing the original eBWT faster, simpler, and with less memory

C Boucher, D Cenzato, Z Lipták, M Rossi… - … Symposium on String …, 2021 - Springer
Mantaci et al. TCS 2007 defined the eBWT eBWT to extend the definition of the BWT BWT to
a collection of strings. However, since this introduction, it has been used more generally to …

A fast and small subsampled r-index

D Cobas, T Gagie, G Navarro - arxiv preprint arxiv:2103.15329, 2021 - arxiv.org
The $ r $-index (Gagie et al., JACM 2020) represented a breakthrough in compressed
indexing of repetitive text collections, outperforming its alternatives by orders of magnitude …

Breaking the 𝒪(n)-Barrier in the Construction of Compressed Suffix Arrays and Suffix Trees

D Kempa, T Kociumaka - Proceedings of the 2023 Annual ACM-SIAM …, 2023 - SIAM
The suffix array, describing the lexicographical order of suffixes of a given text, and the suffix
tree, a path-compressed trie of all suffixes, are the two most fundamental data structures for …

LZ77 via prefix-free parsing

A Hong, M Rossi, C Boucher - 2023 Proceedings of the Symposium on …, 2023 - SIAM
In this paper, we present an algorithm for constructing the Lempel-Ziv 77 (LZ77) factorization
using prefix-free parsing, an algorithm that was first developed as preprocessing algorithm …