Compressed full-text indexes

G Navarro, V Mäkinen - ACM Computing Surveys (CSUR), 2007 - dl.acm.org
Full-text indexes provide fast substring search over large text collections. A serious problem
of these indexes has traditionally been their space consumption. A recent trend is to develop …

Data structures based on k-mers for querying large collections of sequencing data sets

C Marchet, C Boucher, SJ Puglisi, P Medvedev… - Genome …, 2021 - genome.cshlp.org
High-throughput sequencing data sets are usually deposited in public repositories (eg, the
European Nucleotide Archive) to ensure reproducibility. As the amount of data has reached …

Autoregressive search engines: Generating substrings as document identifiers

M Bevilacqua, G Ottaviano, P Lewis… - Advances in …, 2022 - proceedings.neurips.cc
Abstract Knowledge-intensive language tasks require NLP systems to both provide the
correct answer and retrieve supporting evidence for it in a given corpus. Autoregressive …

The case for learned index structures

T Kraska, A Beutel, EH Chi, J Dean… - Proceedings of the 2018 …, 2018 - dl.acm.org
Indexes are models: a\btree-Index can be seen as a model to map a key to the position of a
record within a sorted array, a Hash-Index as a model to map a key to a position of a record …

[КНИГА][B] Modern information retrieval

R Baeza-Yates, B Ribeiro-Neto - 1999 - people.ischool.berkeley.edu
Information retrieval (IR) has changed considerably in recent years with the expansion of the
World Wide Web and the advent of modern and inexpensive graphical user interfaces and …

POCLib: A high-performance framework for enabling near orthogonal processing on compression

F Zhang, J Zhai, X Shen, O Mutlu… - IEEE transactions on …, 2021 - ieeexplore.ieee.org
Parallel technology boosts data processing in recent years, and parallel direct data
processing on hierarchically compressed documents exhibits great promise. The high …

Succinct indexable dictionaries with applications to encoding k-ary trees, prefix sums and multisets

R Raman, V Raman, SR Satti - ACM Transactions on Algorithms (TALG), 2007 - dl.acm.org
We consider the indexable dictionary problem, which consists of storing a set S⊆{0,…, m−
1} for some integer m while supporting the operations of rank (x), which returns the number …

Indexing compressed text

P Ferragina, G Manzini - Journal of the ACM (JACM), 2005 - dl.acm.org
We design two compressed data structures for the full-text indexing problem that support
efficient substring searches using roughly the space required for storing the text in …

Compressed suffix arrays and suffix trees with applications to text indexing and string matching

R Grossi, JS Vitter - Proceedings of the thirty-second annual ACM …, 2000 - dl.acm.org
The proliferation of online text, such as on the World Wide Web and in databases, motivates
the need for space-efficient index methods that support fast search. Consider a text T of n …

Bridging items and language: A transition paradigm for large language model-based recommendation

X Lin, W Wang, Y Li, F Feng, SK Ng… - Proceedings of the 30th …, 2024 - dl.acm.org
Harnessing Large Language Models (LLMs) for recommendation is rapidly emerging, which
relies on two fundamental steps to bridge the recommendation item space and the language …