Text embeddings by weakly-supervised contrastive pre-training

L Wang, N Yang, X Huang, B Jiao, L Yang… - arxiv preprint arxiv …, 2022 - arxiv.org
This paper presents E5, a family of state-of-the-art text embeddings that transfer well to a
wide range of tasks. The model is trained in a contrastive manner with weak supervision …

Dense text retrieval based on pretrained language models: A survey

WX Zhao, J Liu, R Ren, JR Wen - ACM Transactions on Information …, 2024 - dl.acm.org
Text retrieval is a long-standing research topic on information seeking, where a system is
required to return relevant information resources to user's queries in natural language. From …

Promptagator: Few-shot dense retrieval from 8 examples

Z Dai, VY Zhao, J Ma, Y Luan, J Ni, J Lu… - arxiv preprint arxiv …, 2022 - arxiv.org
Much recent research on information retrieval has focused on how to transfer from one task
(typically with abundant supervised data) to various other tasks where supervision is limited …

How to train your dragon: Diverse augmentation towards generalizable dense retrieval

SC Lin, A Asai, M Li, B Oguz, J Lin, Y Mehdad… - arxiv preprint arxiv …, 2023 - arxiv.org
Various techniques have been developed in recent years to improve dense retrieval (DR),
such as unsupervised contrastive learning and pseudo-query generation. Existing DRs …

Improving passage retrieval with zero-shot question generation

DS Sachan, M Lewis, M Joshi, A Aghajanyan… - arxiv preprint arxiv …, 2022 - arxiv.org
We propose a simple and effective re-ranking method for improving passage retrieval in
open question answering. The re-ranker re-scores retrieved passages with a zero-shot …

Task-aware retrieval with instructions

A Asai, T Schick, P Lewis, X Chen, G Izacard… - arxiv preprint arxiv …, 2022 - arxiv.org
We study the problem of retrieval with instructions, where users of a retrieval system
explicitly describe their intent along with their queries. We aim to develop a general-purpose …

Dense x retrieval: What retrieval granularity should we use?

T Chen, H Wang, S Chen, W Yu, K Ma… - Proceedings of the …, 2024 - aclanthology.org
Dense retrieval has become a prominent method to obtain relevant context or world
knowledge in open-domain NLP tasks. When we use a learned dense retriever on a …

Simlm: Pre-training with representation bottleneck for dense passage retrieval

L Wang, N Yang, X Huang, B Jiao, L Yang… - arxiv preprint arxiv …, 2022 - arxiv.org
In this paper, we propose SimLM (Similarity matching with Language Model pre-training), a
simple yet effective pre-training method for dense passage retrieval. It employs a simple …

Rethinking the role of token retrieval in multi-vector retrieval

J Lee, Z Dai, SMK Duddu, T Lei… - Advances in …, 2023 - proceedings.neurips.cc
Multi-vector retrieval models such as ColBERT [Khattab et al., 2020] allow token-level
interactions between queries and documents, and hence achieve state of the art on many …

Towards robust ranker for text retrieval

Y Zhou, T Shen, X Geng, C Tao, C Xu, G Long… - arxiv preprint arxiv …, 2022 - arxiv.org
A ranker plays an indispensable role in the de facto'retrieval & rerank'pipeline, but its
training still lags behind--learning from moderate negatives or/and serving as an auxiliary …