Fine-tuning llama for multi-stage text retrieval

X Ma, L Wang, N Yang, F Wei, J Lin - Proceedings of the 47th …, 2024 - dl.acm.org
While large language models (LLMs) have shown impressive NLP capabilities, existing IR
applications mainly focus on prompting LLMs to generate query expansions or generating …

Rankvicuna: Zero-shot listwise document reranking with open-source large language models

R Pradeep, S Sharifymoghaddam, J Lin - arxiv preprint arxiv:2309.15088, 2023 - arxiv.org
Researchers have successfully applied large language models (LLMs) such as ChatGPT to
reranking in an information retrieval context, but to date, such work has mostly been built on …

RankZephyr: Effective and Robust Zero-Shot Listwise Reranking is a Breeze!

R Pradeep, S Sharifymoghaddam, J Lin - arxiv preprint arxiv:2312.02724, 2023 - arxiv.org
In information retrieval, proprietary large language models (LLMs) such as GPT-4 and open-
source counterparts such as LLaMA and Vicuna have played a vital role in reranking …

Doc2Query–: when less is more

M Gospodinov, S MacAvaney, C Macdonald - European Conference on …, 2023 - Springer
Doc2Query—the process of expanding the content of a document before indexing using a
sequence-to-sequence model—has emerged as a prominent technique for improving the …

The tale of two MSMARCO-and their unfair comparisons

C Lassance, S Clinchant - Proceedings of the 46th International ACM …, 2023 - dl.acm.org
The MS MARCO-passage dataset has been the main large-scale dataset open to the IR
community and it has fostered successfully the development of novel neural retrieval models …

Simple yet effective neural ranking and reranking baselines for cross-lingual information retrieval

J Lin, D Alfonso-Hermelo, V Jeronymo… - arxiv preprint arxiv …, 2023 - arxiv.org
The advent of multilingual language models has generated a resurgence of interest in cross-
lingual information retrieval (CLIR), which is the task of searching documents in one …

Rank-without-GPT: Building GPT-Independent Listwise Rerankers on Open-Source Large Language Models

X Zhang, S Hofstätter, P Lewis, R Tang, J Lin - arxiv preprint arxiv …, 2023 - arxiv.org
Listwise rerankers based on large language models (LLM) are the zero-shot state-of-the-art.
However, current works in this direction all depend on the GPT models, making it a single …

JaColBERTv2. 5: Optimising Multi-Vector Retrievers to Create State-of-the-Art Japanese Retrievers with Constrained Resources

B Clavié - arxiv preprint arxiv:2407.20750, 2024 - arxiv.org
Neural Information Retrieval has advanced rapidly in high-resource languages, but progress
in lower-resource ones such as Japanese has been hindered by data scarcity, among other …

Neural Passage Quality Estimation for Static Pruning

X Chang, D Mishra, C Macdonald… - Proceedings of the 47th …, 2024 - dl.acm.org
Neural networks-especially those that use large, pre-trained language models-have
improved search engines in various ways. Most prominently, they can estimate the …

A Systematic Investigation of Distilling Large Language Models into Cross-Encoders for Passage Re-ranking

F Schlatt, M Fröbe, H Scells, S Zhuang… - arxiv preprint arxiv …, 2024 - arxiv.org
Cross-encoders distilled from large language models are more effective re-rankers than
cross-encoders fine-tuned using manually labeled data. However, the distilled models do …