Information retrieval: recent advances and beyond

KA Hambarde, H Proenca - IEEE Access, 2023 - ieeexplore.ieee.org
This paper provides an extensive and thorough overview of the models and techniques
utilized in the first and second stages of the typical information retrieval processing chain …

Beir: A heterogenous benchmark for zero-shot evaluation of information retrieval models

N Thakur, N Reimers, A Rücklé, A Srivastava… - arxiv preprint arxiv …, 2021 - arxiv.org
Existing neural information retrieval (IR) models have often been studied in homogeneous
and narrow settings, which has considerably limited insights into their out-of-distribution …

Dense text retrieval based on pretrained language models: A survey

WX Zhao, J Liu, R Ren, JR Wen - ACM Transactions on Information …, 2024 - dl.acm.org
Text retrieval is a long-standing research topic on information seeking, where a system is
required to return relevant information resources to user's queries in natural language. From …

Simlm: Pre-training with representation bottleneck for dense passage retrieval

L Wang, N Yang, X Huang, B Jiao, L Yang… - arxiv preprint arxiv …, 2022 - arxiv.org
In this paper, we propose SimLM (Similarity matching with Language Model pre-training), a
simple yet effective pre-training method for dense passage retrieval. It employs a simple …

Inpars: Data augmentation for information retrieval using large language models

L Bonifacio, H Abonizio, M Fadaee… - arxiv preprint arxiv …, 2022 - arxiv.org
The information retrieval community has recently witnessed a revolution due to large
pretrained transformer models. Another key ingredient for this revolution was the MS …

mmarco: A multilingual version of the ms marco passage ranking dataset

L Bonifacio, V Jeronymo, HQ Abonizio… - arxiv preprint arxiv …, 2021 - arxiv.org
The MS MARCO ranking dataset has been widely used for training deep learning models for
IR tasks, achieving considerable effectiveness on diverse zero-shot scenarios. However, this …

Scaling laws for dense retrieval

Y Fang, J Zhan, Q Ai, J Mao, W Su, J Chen… - Proceedings of the 47th …, 2024 - dl.acm.org
Scaling laws have been observed in a wide range of tasks, particularly in language
generation. Previous studies have found that the performance of large language models …

Ernie-search: Bridging cross-encoder with dual-encoder via self on-the-fly distillation for dense passage retrieval

Y Lu, Y Liu, J Liu, Y Shi, Z Huang, SFY Sun… - arxiv preprint arxiv …, 2022 - arxiv.org
Neural retrievers based on pre-trained language models (PLMs), such as dual-encoders,
have achieved promising performance on the task of open-domain question answering …

End-to-end multimodal fact-checking and explanation generation: A challenging dataset and models

BM Yao, A Shah, L Sun, JH Cho, L Huang - Proceedings of the 46th …, 2023 - dl.acm.org
We propose end-to-end multimodal fact-checking and explanation generation, where the
input is a claim and a large collection of web sources, including articles, images, videos, and …

Salient phrase aware dense retrieval: can a dense retriever imitate a sparse one?

X Chen, K Lakhotia, B Oğuz, A Gupta, P Lewis… - arxiv preprint arxiv …, 2021 - arxiv.org
Despite their recent popularity and well-known advantages, dense retrievers still lag behind
sparse methods such as BM25 in their ability to reliably match salient phrases and rare …