Indexing highly repetitive string collections, part II: compressed indexes

G Navarro - ACM Computing Surveys (CSUR), 2021 - dl.acm.org
Two decades ago, a breakthrough in indexing string collections made it possible to
represent them within their compressed space while at the same time offering indexed …

Compressed full-text indexes

G Navarro, V Mäkinen - ACM Computing Surveys (CSUR), 2007 - dl.acm.org
Full-text indexes provide fast substring search over large text collections. A serious problem
of these indexes has traditionally been their space consumption. A recent trend is to develop …

Opportunistic data structures with applications

P Ferragina, G Manzini - Proceedings 41st annual symposium …, 2000 - ieeexplore.ieee.org
We address the issue of compressing and indexing data. We devise a data structure whose
space occupancy is a function of the entropy of the underlying data set. We call the data …

Indexing compressed text

P Ferragina, G Manzini - Journal of the ACM (JACM), 2005 - dl.acm.org
We design two compressed data structures for the full-text indexing problem that support
efficient substring searches using roughly the space required for storing the text in …

A survey on blocking technology of entity resolution

BH Li, Y Liu, AM Zhang, WH Wang, S Wan - Journal of Computer Science …, 2020 - Springer
Entity resolution (ER) is a significant task in data integration, which aims to detect all entity
profiles that correspond to the same real-world entity. Due to its inherently quadratic …

Compressed suffix arrays and suffix trees with applications to text indexing and string matching

R Grossi, JS Vitter - Proceedings of the thirty-second annual ACM …, 2000 - dl.acm.org
The proliferation of online text, such as on the World Wide Web and in databases, motivates
the need for space-efficient index methods that support fast search. Consider a text T of n …

[KIRJA][B] Algorithms on strings

M Crochemore, C Hancart, T Lecroq - 2007 - books.google.com
The book is intended for lectures on string processes and pattern matching in Master's
courses of computer science and software engineering curricula. The details of algorithms …

LSH forest: self-tuning indexes for similarity search

M Bawa, T Condie, P Ganesan - … of the 14th international conference on …, 2005 - dl.acm.org
We consider the problem of indexing high-dimensional data for answering (approximate)
similarity-search queries. Similarity indexes prove to be important in a wide variety of …

External memory algorithms and data structures: Dealing with massive data

JS Vitter - ACM Computing surveys (CsUR), 2001 - dl.acm.org
Data sets in large applications are often too massive to fit completely inside the computers
internal memory. The resulting input/output communication (or I/O) between fast internal …

Linear work suffix array construction

J Kärkkäinen, P Sanders, S Burkhardt - Journal of the ACM (JACM), 2006 - dl.acm.org
Suffix trees and suffix arrays are widely used and largely interchangeable index structures
on strings and sequences. Practitioners prefer suffix arrays due to their simplicity and space …