Pyserini: A Python toolkit for reproducible information retrieval research with sparse and dense representations

J Lin, X Ma, SC Lin, JH Yang, R Pradeep… - Proceedings of the 44th …, 2021 - dl.acm.org
Pyserini is a Python toolkit for reproducible information retrieval research with sparse and
dense representations. It aims to provide effective, reproducible, and easy-to-use first-stage …

[BOOK][B] Pretrained transformers for text ranking: Bert and beyond

J Lin, R Nogueira, A Yates - 2022 - books.google.com
The goal of text ranking is to generate an ordered list of texts retrieved from a corpus in
response to a query. Although the most common formulation of text ranking is search …

PARADE: Passage Representation Aggregation forDocument Reranking

C Li, A Yates, S MacAvaney, B He, Y Sun - ACM Transactions on …, 2023 - dl.acm.org
Pre-trained transformer models, such as BERT and T5, have shown to be highly effective at
ad hoc passage and document ranking. Due to the inherent sequence length limits of these …

Simplified data wrangling with ir_datasets

S MacAvaney, A Yates, S Feldman, D Downey… - Proceedings of the 44th …, 2021 - dl.acm.org
Managing the data for Information Retrieval (IR) experiments can be challenging. Dataset
documentation is scattered across the Internet and once one obtains a copy of the data …

Pyserini: An easy-to-use python toolkit to support replicable ir research with sparse and dense representations

J Lin, X Ma, SC Lin, JH Yang, R Pradeep… - arxiv preprint arxiv …, 2021 - arxiv.org
Pyserini is an easy-to-use Python toolkit that supports replicable IR research by providing
effective first-stage retrieval in a multi-stage ranking architecture. Our toolkit is self-contained …

Squeezing water from a stone: a bag of tricks for further improving cross-encoder effectiveness for reranking

R Pradeep, Y Liu, X Zhang, Y Li, A Yates… - European Conference on …, 2022 - Springer
While much recent work has demonstrated that hard negative mining can be used to train
better bi-encoder models, few have considered it in the context of cross-encoders, which are …

Sprint: A unified toolkit for evaluating and demystifying zero-shot neural sparse retrieval

N Thakur, K Wang, I Gurevych, J Lin - Proceedings of the 46th …, 2023 - dl.acm.org
Traditionally, sparse retrieval systems relied on lexical representations to retrieve
documents, such as BM25, dominated information retrieval tasks. With the onset of pre …

Comparing score aggregation approaches for document retrieval with pretrained transformers

X Zhang, A Yates, J Lin - … Retrieval: 43rd European Conference on IR …, 2021 - Springer
While BERT has been shown to be effective for passage retrieval, its maximum input length
limitation poses a challenge when applying the model to document retrieval. In this work, we …

Axiomatic Retrieval Experimentation with ir_axioms

A Bondarenko, M Fröbe, JH Reimer, B Stein… - Proceedings of the 45th …, 2022 - dl.acm.org
Axiomatic approaches to information retrieval have played a key role in determining basic
constraints that characterize good retrieval models. Beyond their importance in retrieval …

A little bit is worse than none: Ranking with limited training data

X Zhang, A Yates, J Lin - … of SustaiNLP: Workshop on Simple and …, 2020 - aclanthology.org
Researchers have proposed simple yet effective techniques for the retrieval problem based
on using BERT as a relevance classifier to rerank initial candidates from keyword search. In …