Information retrieval: recent advances and beyond

KA Hambarde, H Proenca - IEEE Access, 2023 - ieeexplore.ieee.org
This paper provides an extensive and thorough overview of the models and techniques
utilized in the first and second stages of the typical information retrieval processing chain …

[PDF][PDF] A survey of large language models

WX Zhao, K Zhou, J Li, T Tang… - arxiv preprint arxiv …, 2023 - paper-notes.zhjwpku.com
Ever since the Turing Test was proposed in the 1950s, humans have explored the mastering
of language intelligence by machine. Language is essentially a complex, intricate system of …

Bge m3-embedding: Multi-lingual, multi-functionality, multi-granularity text embeddings through self-knowledge distillation

J Chen, S **ao, P Zhang, K Luo, D Lian… - arxiv preprint arxiv …, 2024 - arxiv.org
In this paper, we present a new embedding model, called M3-Embedding, which is
distinguished for its versatility in Multi-Linguality, Multi-Functionality, and Multi-Granularity. It …

Text embeddings by weakly-supervised contrastive pre-training

L Wang, N Yang, X Huang, B Jiao, L Yang… - arxiv preprint arxiv …, 2022 - arxiv.org
This paper presents E5, a family of state-of-the-art text embeddings that transfer well to a
wide range of tasks. The model is trained in a contrastive manner with weak supervision …

Query2doc: Query expansion with large language models

L Wang, N Yang, F Wei - arxiv preprint arxiv:2303.07678, 2023 - arxiv.org
This paper introduces a simple yet effective query expansion approach, denoted as
query2doc, to improve both sparse and dense retrieval systems. The proposed method first …

Dense text retrieval based on pretrained language models: A survey

WX Zhao, J Liu, R Ren, JR Wen - ACM Transactions on Information …, 2024 - dl.acm.org
Text retrieval is a long-standing research topic on information seeking, where a system is
required to return relevant information resources to user's queries in natural language. From …

Promptagator: Few-shot dense retrieval from 8 examples

Z Dai, VY Zhao, J Ma, Y Luan, J Ni, J Lu… - arxiv preprint arxiv …, 2022 - arxiv.org
Much recent research on information retrieval has focused on how to transfer from one task
(typically with abundant supervised data) to various other tasks where supervision is limited …

Colbertv2: Effective and efficient retrieval via lightweight late interaction

K Santhanam, O Khattab, J Saad-Falcon… - arxiv preprint arxiv …, 2021 - arxiv.org
Neural information retrieval (IR) has greatly advanced search and other knowledge-
intensive language tasks. While many neural IR methods encode queries and documents …

Evaluating open-domain question answering in the era of large language models

E Kamalloo, N Dziri, CLA Clarke, D Rafiei - arxiv preprint arxiv …, 2023 - arxiv.org
Lexical matching remains the de facto evaluation method for open-domain question
answering (QA). Unfortunately, lexical matching fails completely when a plausible candidate …

Rankt5: Fine-tuning t5 for text ranking with ranking losses

H Zhuang, Z Qin, R Jagerman, K Hui, J Ma… - Proceedings of the 46th …, 2023 - dl.acm.org
Pretrained language models such as BERT have been shown to be exceptionally effective
for text ranking. However, there are limited studies on how to leverage more powerful …