Fine-tuning llama for multi-stage text retrieval
While large language models (LLMs) have shown impressive NLP capabilities, existing IR
applications mainly focus on prompting LLMs to generate query expansions or generating …
applications mainly focus on prompting LLMs to generate query expansions or generating …
Rankvicuna: Zero-shot listwise document reranking with open-source large language models
Researchers have successfully applied large language models (LLMs) such as ChatGPT to
reranking in an information retrieval context, but to date, such work has mostly been built on …
reranking in an information retrieval context, but to date, such work has mostly been built on …
RankZephyr: Effective and Robust Zero-Shot Listwise Reranking is a Breeze!
In information retrieval, proprietary large language models (LLMs) such as GPT-4 and open-
source counterparts such as LLaMA and Vicuna have played a vital role in reranking …
source counterparts such as LLaMA and Vicuna have played a vital role in reranking …
Doc2Query–: when less is more
Doc2Query—the process of expanding the content of a document before indexing using a
sequence-to-sequence model—has emerged as a prominent technique for improving the …
sequence-to-sequence model—has emerged as a prominent technique for improving the …
The tale of two MSMARCO-and their unfair comparisons
The MS MARCO-passage dataset has been the main large-scale dataset open to the IR
community and it has fostered successfully the development of novel neural retrieval models …
community and it has fostered successfully the development of novel neural retrieval models …
Simple yet effective neural ranking and reranking baselines for cross-lingual information retrieval
The advent of multilingual language models has generated a resurgence of interest in cross-
lingual information retrieval (CLIR), which is the task of searching documents in one …
lingual information retrieval (CLIR), which is the task of searching documents in one …
Rank-without-GPT: Building GPT-Independent Listwise Rerankers on Open-Source Large Language Models
Listwise rerankers based on large language models (LLM) are the zero-shot state-of-the-art.
However, current works in this direction all depend on the GPT models, making it a single …
However, current works in this direction all depend on the GPT models, making it a single …
JaColBERTv2. 5: Optimising Multi-Vector Retrievers to Create State-of-the-Art Japanese Retrievers with Constrained Resources
B Clavié - arxiv preprint arxiv:2407.20750, 2024 - arxiv.org
Neural Information Retrieval has advanced rapidly in high-resource languages, but progress
in lower-resource ones such as Japanese has been hindered by data scarcity, among other …
in lower-resource ones such as Japanese has been hindered by data scarcity, among other …
Neural Passage Quality Estimation for Static Pruning
X Chang, D Mishra, C Macdonald… - Proceedings of the 47th …, 2024 - dl.acm.org
Neural networks-especially those that use large, pre-trained language models-have
improved search engines in various ways. Most prominently, they can estimate the …
improved search engines in various ways. Most prominently, they can estimate the …
A Systematic Investigation of Distilling Large Language Models into Cross-Encoders for Passage Re-ranking
Cross-encoders distilled from large language models are more effective re-rankers than
cross-encoders fine-tuned using manually labeled data. However, the distilled models do …
cross-encoders fine-tuned using manually labeled data. However, the distilled models do …