Information retrieval: recent advances and beyond
This paper provides an extensive and thorough overview of the models and techniques
utilized in the first and second stages of the typical information retrieval processing chain …
utilized in the first and second stages of the typical information retrieval processing chain …
Beir: A heterogenous benchmark for zero-shot evaluation of information retrieval models
Existing neural information retrieval (IR) models have often been studied in homogeneous
and narrow settings, which has considerably limited insights into their out-of-distribution …
and narrow settings, which has considerably limited insights into their out-of-distribution …
Dense text retrieval based on pretrained language models: A survey
Text retrieval is a long-standing research topic on information seeking, where a system is
required to return relevant information resources to user's queries in natural language. From …
required to return relevant information resources to user's queries in natural language. From …
Simlm: Pre-training with representation bottleneck for dense passage retrieval
In this paper, we propose SimLM (Similarity matching with Language Model pre-training), a
simple yet effective pre-training method for dense passage retrieval. It employs a simple …
simple yet effective pre-training method for dense passage retrieval. It employs a simple …
Inpars: Data augmentation for information retrieval using large language models
The information retrieval community has recently witnessed a revolution due to large
pretrained transformer models. Another key ingredient for this revolution was the MS …
pretrained transformer models. Another key ingredient for this revolution was the MS …
mmarco: A multilingual version of the ms marco passage ranking dataset
The MS MARCO ranking dataset has been widely used for training deep learning models for
IR tasks, achieving considerable effectiveness on diverse zero-shot scenarios. However, this …
IR tasks, achieving considerable effectiveness on diverse zero-shot scenarios. However, this …
Scaling laws for dense retrieval
Scaling laws have been observed in a wide range of tasks, particularly in language
generation. Previous studies have found that the performance of large language models …
generation. Previous studies have found that the performance of large language models …
Ernie-search: Bridging cross-encoder with dual-encoder via self on-the-fly distillation for dense passage retrieval
Neural retrievers based on pre-trained language models (PLMs), such as dual-encoders,
have achieved promising performance on the task of open-domain question answering …
have achieved promising performance on the task of open-domain question answering …
End-to-end multimodal fact-checking and explanation generation: A challenging dataset and models
We propose end-to-end multimodal fact-checking and explanation generation, where the
input is a claim and a large collection of web sources, including articles, images, videos, and …
input is a claim and a large collection of web sources, including articles, images, videos, and …
Salient phrase aware dense retrieval: can a dense retriever imitate a sparse one?
Despite their recent popularity and well-known advantages, dense retrievers still lag behind
sparse methods such as BM25 in their ability to reliably match salient phrases and rare …
sparse methods such as BM25 in their ability to reliably match salient phrases and rare …