Information retrieval: recent advances and beyond

KA Hambarde, H Proenca - IEEE Access, 2023 - ieeexplore.ieee.org
This paper provides an extensive and thorough overview of the models and techniques
utilized in the first and second stages of the typical information retrieval processing chain …

Semantic models for the first-stage retrieval: A comprehensive review

J Guo, Y Cai, Y Fan, F Sun, R Zhang… - ACM Transactions on …, 2022 - dl.acm.org
Multi-stage ranking pipelines have been a practical solution in modern search systems,
where the first-stage retrieval is to return a subset of candidate documents and latter stages …

Bge m3-embedding: Multi-lingual, multi-functionality, multi-granularity text embeddings through self-knowledge distillation

J Chen, S **ao, P Zhang, K Luo, D Lian… - arxiv preprint arxiv …, 2024 - arxiv.org
In this paper, we present a new embedding model, called M3-Embedding, which is
distinguished for its versatility in Multi-Linguality, Multi-Functionality, and Multi-Granularity. It …

Dense text retrieval based on pretrained language models: A survey

WX Zhao, J Liu, R Ren, JR Wen - ACM Transactions on Information …, 2024 - dl.acm.org
Text retrieval is a long-standing research topic on information seeking, where a system is
required to return relevant information resources to user's queries in natural language. From …

Colbertv2: Effective and efficient retrieval via lightweight late interaction

K Santhanam, O Khattab, J Saad-Falcon… - arxiv preprint arxiv …, 2021 - arxiv.org
Neural information retrieval (IR) has greatly advanced search and other knowledge-
intensive language tasks. While many neural IR methods encode queries and documents …

Beir: A heterogenous benchmark for zero-shot evaluation of information retrieval models

N Thakur, N Reimers, A Rücklé, A Srivastava… - arxiv preprint arxiv …, 2021 - arxiv.org
Existing neural information retrieval (IR) models have often been studied in homogeneous
and narrow settings, which has considerably limited insights into their out-of-distribution …

Medcpt: Contrastive pre-trained transformers with large-scale pubmed search logs for zero-shot biomedical information retrieval

Q **, W Kim, Q Chen, DC Comeau, L Yeganova… - …, 2023 - academic.oup.com
Motivation Information retrieval (IR) is essential in biomedical knowledge acquisition and
clinical decision support. While recent progress has shown that language model encoders …

In-batch negatives for knowledge distillation with tightly-coupled teachers for dense retrieval

SC Lin, JH Yang, J Lin - Proceedings of the 6th Workshop on …, 2021 - aclanthology.org
We present an efficient training approach to text retrieval with dense representations that
applies knowledge distillation using the ColBERT late-interaction ranking model …

M3-embedding: Multi-linguality, multi-functionality, multi-granularity text embeddings through self-knowledge distillation

J Chen, S **ao, P Zhang, K Luo, D Lian… - Findings of the …, 2024 - aclanthology.org
In this paper, we introduce a new embedding model called M3-Embedding, which is
distinguished for its versatility in Multi-Linguality, Multi-Functionality, and Multi-Granularity. It …

Pre-training methods in information retrieval

Y Fan, X **e, Y Cai, J Chen, X Ma, X Li… - … and Trends® in …, 2022 - nowpublishers.com
The core of information retrieval (IR) is to identify relevant information from large-scale
resources and return it as a ranked list to respond to user's information need. In recent years …