Indexing highly repetitive string collections, part II: Compressed indexes

G Navarro - ACM Computing Surveys (CSUR), 2021 - dl.acm.org
Two decades ago, a breakthrough in indexing string collections made it possible to
represent them within their compressed space while at the same time offering indexed …

Fully functional suffix trees and optimal text searching in BWT-runs bounded space

T Gagie, G Navarro, N Prezza - Journal of the ACM (JACM), 2020 - dl.acm.org
Indexing highly repetitive texts—such as genomic databases, software repositories and
versioned text collections—has become an important problem since the turn of the …

At the roots of dictionary compression: string attractors

D Kempa, N Prezza - Proceedings of the 50th Annual ACM SIGACT …, 2018 - dl.acm.org
A well-known fact in the field of lossless text compression is that high-order entropy is a
weak model when the input contains long repetitions. Motivated by this fact, decades of …

Collapsing the hierarchy of compressed data structures: Suffix arrays in optimal compressed space

D Kempa, T Kociumaka - 2023 IEEE 64th Annual Symposium …, 2023 - ieeexplore.ieee.org
The last two decades have witnessed a dramatic increase in the amount of highly repetitive
datasets consisting of sequential data (strings, texts). Processing these massive amounts of …

Optimal-time text indexing in BWT-runs bounded space

T Gagie, G Navarro, N Prezza - Proceedings of the Twenty-Ninth Annual ACM …, 2018 - SIAM
Indexing highly repetitive texts—such as genomic databases, software repositories and
versioned text collections—has become an important problem since the turn of the …

String synchronizing sets: sublinear-time BWT construction and optimal LCE data structure

D Kempa, T Kociumaka - Proceedings of the 51st Annual ACM SIGACT …, 2019 - dl.acm.org
Burrows–Wheeler transform (BWT) is an invertible text transformation that, given a text T of
length n, permutes its symbols according to the lexicographic order of suffixes of T. BWT is …

Near-optimal quantum algorithms for bounded edit distance and lempel-ziv factorization

D Gibney, C **, T Kociumaka, SV Thankachan - Proceedings of the 2024 …, 2024 - SIAM
Measuring sequence similarity and compressing texts are among the most fundamental
tasks in string algorithms. In this work, we develop near-optimal quantum algorithms for the …

External memory BWT and LCP computation for sequence collections with applications

L Egidi, FA Louza, G Manzini, GP Telles - Algorithms for Molecular Biology, 2019 - Springer
Background Sequencing technologies produce larger and larger collections of
biosequences that have to be stored in compressed indices supporting fast search …

Text indexing for long patterns: Anchors are all you need

L Ayad, G Loukidis, S Pissis - Proceedings of the VLDB Endowment …, 2023 - kclpure.kcl.ac.uk
In many real-world database systems, a large fraction of the data is represented by strings:
sequences of letters over some alphabet. This is because strings can easily encode data …

On the complexity of BWT-runs minimization via alphabet reordering

J Bentley, D Gibney, SV Thankachan - arxiv preprint arxiv:1911.03035, 2019 - arxiv.org
The Burrows-Wheeler Transform (BWT) has been an essential tool in text compression and
indexing. First introduced in 1994, it went on to provide the backbone for the first encoding of …