Pyserini: A Python toolkit for reproducible information retrieval research with sparse and dense representations
Pyserini is a Python toolkit for reproducible information retrieval research with sparse and
dense representations. It aims to provide effective, reproducible, and easy-to-use first-stage …
dense representations. It aims to provide effective, reproducible, and easy-to-use first-stage …
[BOOK][B] Pretrained transformers for text ranking: Bert and beyond
The goal of text ranking is to generate an ordered list of texts retrieved from a corpus in
response to a query. Although the most common formulation of text ranking is search …
response to a query. Although the most common formulation of text ranking is search …
PARADE: Passage Representation Aggregation forDocument Reranking
Pre-trained transformer models, such as BERT and T5, have shown to be highly effective at
ad hoc passage and document ranking. Due to the inherent sequence length limits of these …
ad hoc passage and document ranking. Due to the inherent sequence length limits of these …
Simplified data wrangling with ir_datasets
Managing the data for Information Retrieval (IR) experiments can be challenging. Dataset
documentation is scattered across the Internet and once one obtains a copy of the data …
documentation is scattered across the Internet and once one obtains a copy of the data …
Pyserini: An easy-to-use python toolkit to support replicable ir research with sparse and dense representations
Pyserini is an easy-to-use Python toolkit that supports replicable IR research by providing
effective first-stage retrieval in a multi-stage ranking architecture. Our toolkit is self-contained …
effective first-stage retrieval in a multi-stage ranking architecture. Our toolkit is self-contained …
Squeezing water from a stone: a bag of tricks for further improving cross-encoder effectiveness for reranking
While much recent work has demonstrated that hard negative mining can be used to train
better bi-encoder models, few have considered it in the context of cross-encoders, which are …
better bi-encoder models, few have considered it in the context of cross-encoders, which are …
Sprint: A unified toolkit for evaluating and demystifying zero-shot neural sparse retrieval
Traditionally, sparse retrieval systems relied on lexical representations to retrieve
documents, such as BM25, dominated information retrieval tasks. With the onset of pre …
documents, such as BM25, dominated information retrieval tasks. With the onset of pre …
Comparing score aggregation approaches for document retrieval with pretrained transformers
While BERT has been shown to be effective for passage retrieval, its maximum input length
limitation poses a challenge when applying the model to document retrieval. In this work, we …
limitation poses a challenge when applying the model to document retrieval. In this work, we …
Axiomatic Retrieval Experimentation with ir_axioms
Axiomatic approaches to information retrieval have played a key role in determining basic
constraints that characterize good retrieval models. Beyond their importance in retrieval …
constraints that characterize good retrieval models. Beyond their importance in retrieval …
A little bit is worse than none: Ranking with limited training data
Researchers have proposed simple yet effective techniques for the retrieval problem based
on using BERT as a relevance classifier to rerank initial candidates from keyword search. In …
on using BERT as a relevance classifier to rerank initial candidates from keyword search. In …