Compressed text indexes: From theory to practice

P Ferragina, R González, G Navarro… - Journal of Experimental …, 2009 - dl.acm.org
A compressed full-text self-index represents a text in a compressed form and still answers
queries efficiently. This represents a significant advancement over the (full-) text indexing …

Prospects and limitations of full-text index structures in genome analysis

M Vyverman, B De Baets, V Fack… - Nucleic acids …, 2012 - academic.oup.com
The combination of incessant advances in sequencing technology producing large amounts
of data and innovative bioinformatics approaches, designed to cope with this data flood, has …

Optimal construction of compressed indexes for highly repetitive texts

D Kempa - Proceedings of the Thirtieth Annual ACM-SIAM …, 2019 - SIAM
We propose algorithms that, given the input string of length n over integer alphabet of size σ,
construct the Burrows–Wheeler transform (BWT), the permuted longest-common-prefix …

Stronger Lempel-Ziv based compressed text indexing

D Arroyuelo, G Navarro, K Sadakane - Algorithmica, 2012 - Springer
Abstract Given a text T [1.. u] over an alphabet of size σ, the full-text search problem consists
in finding the occ occurrences of a given pattern P [1.. m] in T. In indexed text searching we …

Algorithms and compressed data structures for information retrieval

S Ladra - 2011 - ruc.udc.es
In this thesis we address the problem of the efficiency in Information Retrieval by presenting
new compressed data structures and algorithms that can be used in different application …

Space-efficient construction of Lempel–Ziv compressed text indexes

D Arroyuelo, G Navarro - Information and Computation, 2011 - Elsevier
A compressed full-text self-index is a data structure that replaces a text and in addition gives
indexed access to it, while taking space proportional to the compressed text size. This is very …

Boosting text compression with word-based statistical encoding

A Farina, G Navarro, JR Paramá - The Computer Journal, 2012 - ieeexplore.ieee.org
Semistatic word-based byte-oriented compressors are known to be attractive alternatives to
compress natural language texts. With compression ratios around 30–35%, they allow fast …

Self-index based on lz77 (thesis)

S Kreft, G Navarro - ar** of long next generation sequencing reads
M Vyverman - 2014 - core.ac.uk
This chapter introduces basic notations and concepts that are used throughout this work,
many of which will be familiar to readers with a background in bioinformatics. Section 1.1 …

[PDF][PDF] On searching and extracting strings from compressed textual data.

R Venturini - 2010 - hpc.isti.cnr.it
A large fraction of the data we process every day consists of a sequence of symbols over an
alphabet, and hence is a text. Unformatted natural language documents, XML and HTML file …