Techniques for inverted index compression

GE Pibiri, R Venturini - ACM Computing Surveys (CSUR), 2020 - dl.acm.org
The data structure at the core of large-scale search engines is the inverted index, which is
essentially a collection of sorted integer sequences called inverted lists. Because of the …

PISA: Performant indexes and search for academia

A Mallia, M Siedlaczek, J Mackenzie… - Proceedings of the Open …, 2019 - par.nsf.gov
Performant Indexes and Search for Academia (PISA) is an experimental search engine that
focuses on efficient implementations of state-of-the-art representations and algorithms for …

Efficient query processing for scalable web search

N Tonellotto, C Macdonald, I Ounis - Foundations and Trends® …, 2018 - nowpublishers.com
Search engines are exceptionally important tools for accessing information in today's world.
In satisfying the information needs of millions of users, the effectiveness (the quality of the …

Scalability challenges in web search engines

BB Cambazoglu, R Baeza-Yates - Advanced topics in information retrieval, 2011 - Springer
Continuous growth of the Web and user bases forces web search engine companies to
make costly investments on very large compute infrastructures. The scalability of these …

Faster BlockMax WAND with variable-sized blocks

A Mallia, G Ottaviano, E Porciani, N Tonellotto… - Proceedings of the 40th …, 2017 - dl.acm.org
Query processing is one of the main bottlenecks in large-scale search engines. Retrieving
the top k most relevant documents for a given query can be extremely expensive, as it …

[HTML][HTML] CoCo-trie: Data-aware compression and indexing of strings

A Boffa, P Ferragina, F Tosoni, G Vinciguerra - Information Systems, 2024 - Elsevier
We address the problem of compressing and indexing a sorted dictionary of strings to
support efficient lookups and more sophisticated operations, such as prefix, predecessor …

Fast dictionary-based compression for inverted indexes

GE Pibiri, M Petri, A Moffat - … of the twelfth ACM international conference …, 2019 - dl.acm.org
Dictionary-based compression schemes provide fast decoding operation, typically at the
expense of reduced compression effectiveness compared to statistical or probability-based …

Compressing and querying integer dictionaries under linearities and repetitions

P Ferragina, G Manzini, G Vinciguerra - IEEE Access, 2022 - ieeexplore.ieee.org
We revisit the fundamental problem of compressing an integer dictionary that supports
efficient and operations by exploiting simultaneously two kinds of regularities arising in real …

Index compression using byte-aligned ANS coding and two-dimensional contexts

A Moffat, M Petri - Proceedings of the Eleventh ACM International …, 2018 - dl.acm.org
We examine approaches used for block-based inverted index compression, such as the
OptPFOR mechanism, in which fixed-length blocks of postings data are compressed …

Clustered elias-fano indexes

GE Pibiri, R Venturini - ACM Transactions on Information Systems (TOIS), 2017 - dl.acm.org
State-of-the-art encoders for inverted indexes compress each posting list individually.
Encoding clusters of posting lists offers the possibility of reducing the redundancy of the lists …