Indexing highly repetitive string collections, part II: Compressed indexes
G Navarro - ACM Computing Surveys (CSUR), 2021 - dl.acm.org
Two decades ago, a breakthrough in indexing string collections made it possible to
represent them within their compressed space while at the same time offering indexed …
represent them within their compressed space while at the same time offering indexed …
Fully functional suffix trees and optimal text searching in BWT-runs bounded space
Indexing highly repetitive texts—such as genomic databases, software repositories and
versioned text collections—has become an important problem since the turn of the …
versioned text collections—has become an important problem since the turn of the …
At the roots of dictionary compression: string attractors
A well-known fact in the field of lossless text compression is that high-order entropy is a
weak model when the input contains long repetitions. Motivated by this fact, decades of …
weak model when the input contains long repetitions. Motivated by this fact, decades of …
Collapsing the hierarchy of compressed data structures: Suffix arrays in optimal compressed space
The last two decades have witnessed a dramatic increase in the amount of highly repetitive
datasets consisting of sequential data (strings, texts). Processing these massive amounts of …
datasets consisting of sequential data (strings, texts). Processing these massive amounts of …
Resolution of the burrows-wheeler transform conjecture
Abstract The Burrows-Wheeler Transform (BWT) is an invertible text transformation that
permutes symbols of a text according to the lexicographical order of its suffixes. BWT is the …
permutes symbols of a text according to the lexicographical order of its suffixes. BWT is the …
[BOOK][B] Genome-scale algorithm design
High-throughput sequencing has revolutionised the field of biological sequence analysis. Its
application has enabled researchers to address important biological questions, often for the …
application has enabled researchers to address important biological questions, often for the …
Optimal-time text indexing in BWT-runs bounded space
Indexing highly repetitive texts—such as genomic databases, software repositories and
versioned text collections—has become an important problem since the turn of the …
versioned text collections—has become an important problem since the turn of the …
Towards a definitive measure of repetitiveness
Unlike in statistical compression, where Shannon's entropy is a definitive lower bound, no
such clear measure exists for the compressibility of repetitive sequences. Since statistical …
such clear measure exists for the compressibility of repetitive sequences. Since statistical …
On compressing and indexing repetitive sequences
S Kreft, G Navarro - Theoretical Computer Science, 2013 - Elsevier
We introduce LZ-End, a new member of the Lempel–Ziv family of text compressors, which
achieves compression ratios close to those of LZ77 but is much faster at extracting arbitrary …
achieves compression ratios close to those of LZ77 but is much faster at extracting arbitrary …
Document spanners-a brief overview of concepts, results, and recent developments
The information extraction framework of document spanners was introduced by Fagin,
Kimelfeld, Reiss, and Vansummeren (PODS 2013, J. ACM 2015) as a formalisation of the …
Kimelfeld, Reiss, and Vansummeren (PODS 2013, J. ACM 2015) as a formalisation of the …