Indexing highly repetitive string collections, part II: compressed indexes

G Navarro - ACM Computing Surveys (CSUR), 2021 - dl.acm.org
Two decades ago, a breakthrough in indexing string collections made it possible to
represent them within their compressed space while at the same time offering indexed …

Compressed full-text indexes

G Navarro, V Mäkinen - ACM Computing Surveys (CSUR), 2007 - dl.acm.org
Full-text indexes provide fast substring search over large text collections. A serious problem
of these indexes has traditionally been their space consumption. A recent trend is to develop …

At the roots of dictionary compression: string attractors

D Kempa, N Prezza - Proceedings of the 50th Annual ACM SIGACT …, 2018 - dl.acm.org
A well-known fact in the field of lossless text compression is that high-order entropy is a
weak model when the input contains long repetitions. Motivated by this fact, decades of …

Resolution of the burrows-wheeler transform conjecture

D Kempa, T Kociumaka - Communications of the ACM, 2022 - dl.acm.org
Abstract The Burrows-Wheeler Transform (BWT) is an invertible text transformation that
permutes symbols of a text according to the lexicographical order of its suffixes. BWT is the …

On compressing and indexing repetitive sequences

S Kreft, G Navarro - Theoretical Computer Science, 2013 - Elsevier
We introduce LZ-End, a new member of the Lempel–Ziv family of text compressors, which
achieves compression ratios close to those of LZ77 but is much faster at extracting arbitrary …

Alphabet-independent compressed text indexing

D Belazzougui, G Navarro - ACM Transactions on Algorithms (TALG), 2014 - dl.acm.org
Self-indexes are able to represent a text asymptotically within the information-theoretic lower
bound under the k th order entropy model and offer access to any text substring and indexed …

Toward a definitive compressibility measure for repetitive sequences

T Kociumaka, G Navarro… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
While the th order empirical entropy is an accepted measure of the compressibility of
individual sequences on classical text collections, it is useful only for small values of and …

[HTML][HTML] Universal compressed text indexing

G Navarro, N Prezza - Theoretical Computer Science, 2019 - Elsevier
The rise of repetitive datasets has lately generated a lot of interest in compressed self-
indexes based on dictionary compression, a rich and heterogeneous family of techniques …

Optimal lower and upper bounds for representing sequences

D Belazzougui, G Navarro - ACM Transactions on Algorithms (TALG), 2015 - dl.acm.org
Sequence representations supporting the queries access, select, and rank are at the core of
many data structures. There is a considerable gap between the various upper bounds and …

LZ77 computation based on the run-length encoded BWT

A Policriti, N Prezza - Algorithmica, 2018 - Springer
Computing the LZ77 factorization is a fundamental task in text compression and indexing,
being the size z of this compressed representation closely related to the self-repetitiveness …