Text embeddings by weakly-supervised contrastive pre-training
This paper presents E5, a family of state-of-the-art text embeddings that transfer well to a
wide range of tasks. The model is trained in a contrastive manner with weak supervision …
wide range of tasks. The model is trained in a contrastive manner with weak supervision …
Colbertv2: Effective and efficient retrieval via lightweight late interaction
Neural information retrieval (IR) has greatly advanced search and other knowledge-
intensive language tasks. While many neural IR methods encode queries and documents …
intensive language tasks. While many neural IR methods encode queries and documents …
Unsupervised corpus aware language model pre-training for dense passage retrieval
L Gao, J Callan - arxiv preprint arxiv:2108.05540, 2021 - arxiv.org
Recent research demonstrates the effectiveness of using fine-tuned language models~(LM)
for dense retrieval. However, dense retrievers are hard to train, typically requiring heavily …
for dense retrieval. However, dense retrievers are hard to train, typically requiring heavily …
Promptagator: Few-shot dense retrieval from 8 examples
Much recent research on information retrieval has focused on how to transfer from one task
(typically with abundant supervised data) to various other tasks where supervision is limited …
(typically with abundant supervised data) to various other tasks where supervision is limited …
Autoregressive search engines: Generating substrings as document identifiers
Abstract Knowledge-intensive language tasks require NLP systems to both provide the
correct answer and retrieve supporting evidence for it in a given corpus. Autoregressive …
correct answer and retrieve supporting evidence for it in a given corpus. Autoregressive …
Dense text retrieval based on pretrained language models: A survey
Text retrieval is a long-standing research topic on information seeking, where a system is
required to return relevant information resources to user's queries in natural language. From …
required to return relevant information resources to user's queries in natural language. From …
GPL: Generative pseudo labeling for unsupervised domain adaptation of dense retrieval
Dense retrieval approaches can overcome the lexical gap and lead to significantly improved
search results. However, they require large amounts of training data which is not available …
search results. However, they require large amounts of training data which is not available …
Adversarial retriever-ranker for dense text retrieval
Current dense text retrieval models face two typical challenges. First, they adopt a siamese
dual-encoder architecture to encode queries and documents independently for fast indexing …
dual-encoder architecture to encode queries and documents independently for fast indexing …
Dense x retrieval: What retrieval granularity should we use?
Dense retrieval has become a prominent method to obtain relevant context or world
knowledge in open-domain NLP tasks. When we use a learned dense retriever on a …
knowledge in open-domain NLP tasks. When we use a learned dense retriever on a …
Simlm: Pre-training with representation bottleneck for dense passage retrieval
In this paper, we propose SimLM (Similarity matching with Language Model pre-training), a
simple yet effective pre-training method for dense passage retrieval. It employs a simple …
simple yet effective pre-training method for dense passage retrieval. It employs a simple …