Information retrieval: recent advances and beyond

KA Hambarde, H Proenca - IEEE Access, 2023 - ieeexplore.ieee.org
This paper provides an extensive and thorough overview of the models and techniques
utilized in the first and second stages of the typical information retrieval processing chain …

Text embeddings by weakly-supervised contrastive pre-training

L Wang, N Yang, X Huang, B Jiao, L Yang… - arxiv preprint arxiv …, 2022 - arxiv.org
This paper presents E5, a family of state-of-the-art text embeddings that transfer well to a
wide range of tasks. The model is trained in a contrastive manner with weak supervision …

Beir: A heterogenous benchmark for zero-shot evaluation of information retrieval models

N Thakur, N Reimers, A Rücklé, A Srivastava… - arxiv preprint arxiv …, 2021 - arxiv.org
Existing neural information retrieval (IR) models have often been studied in homogeneous
and narrow settings, which has considerably limited insights into their out-of-distribution …

Colbertv2: Effective and efficient retrieval via lightweight late interaction

K Santhanam, O Khattab, J Saad-Falcon… - arxiv preprint arxiv …, 2021 - arxiv.org
Neural information retrieval (IR) has greatly advanced search and other knowledge-
intensive language tasks. While many neural IR methods encode queries and documents …

Dense text retrieval based on pretrained language models: A survey

WX Zhao, J Liu, R Ren, JR Wen - ACM Transactions on Information …, 2024 - dl.acm.org
Text retrieval is a long-standing research topic on information seeking, where a system is
required to return relevant information resources to user's queries in natural language. From …

Query2doc: Query expansion with large language models

L Wang, N Yang, F Wei - arxiv preprint arxiv:2303.07678, 2023 - arxiv.org
This paper introduces a simple yet effective query expansion approach, denoted as
query2doc, to improve both sparse and dense retrieval systems. The proposed method first …

Task-aware retrieval with instructions

A Asai, T Schick, P Lewis, X Chen, G Izacard… - arxiv preprint arxiv …, 2022 - arxiv.org
We study the problem of retrieval with instructions, where users of a retrieval system
explicitly describe their intent along with their queries. We aim to develop a general-purpose …

The dawn after the dark: An empirical study on factuality hallucination in large language models

J Li, J Chen, R Ren, X Cheng, WX Zhao, JY Nie… - arxiv preprint arxiv …, 2024 - arxiv.org
In the era of large language models (LLMs), hallucination (ie, the tendency to generate
factually incorrect content) poses great challenge to trustworthy and reliable deployment of …

Simplified data wrangling with ir_datasets

S MacAvaney, A Yates, S Feldman, D Downey… - Proceedings of the 44th …, 2021 - dl.acm.org
Managing the data for Information Retrieval (IR) experiments can be challenging. Dataset
documentation is scattered across the Internet and once one obtains a copy of the data …

Coco-dr: Combating distribution shifts in zero-shot dense retrieval with contrastive and distributionally robust learning

Y Yu, C **ong, S Sun, C Zhang, A Overwijk - arxiv preprint arxiv …, 2022 - arxiv.org
We present a new zero-shot dense retrieval (ZeroDR) method, COCO-DR, to improve the
generalization ability of dense retrieval by combating the distribution shifts between source …