Holistic evaluation of language models
Language models (LMs) are becoming the foundation for almost all major language
technologies, but their capabilities, limitations, and risks are not well understood. We present …
technologies, but their capabilities, limitations, and risks are not well understood. We present …
Biomedical question answering: a survey of approaches and challenges
Automatic Question Answering (QA) has been successfully applied in various domains such
as search engines and chatbots. Biomedical QA (BQA), as an emerging QA task, enables …
as search engines and chatbots. Biomedical QA (BQA), as an emerging QA task, enables …
Semantic models for the first-stage retrieval: A comprehensive review
Multi-stage ranking pipelines have been a practical solution in modern search systems,
where the first-stage retrieval is to return a subset of candidate documents and latter stages …
where the first-stage retrieval is to return a subset of candidate documents and latter stages …
Large language models for information retrieval: A survey
As a primary means of information acquisition, information retrieval (IR) systems, such as
search engines, have integrated themselves into our daily lives. These systems also serve …
search engines, have integrated themselves into our daily lives. These systems also serve …
Text and code embeddings by contrastive pre-training
Text embeddings are useful features in many applications such as semantic search and
computing text similarity. Previous work typically trains models customized for different use …
computing text similarity. Previous work typically trains models customized for different use …
Pyserini: A Python toolkit for reproducible information retrieval research with sparse and dense representations
Pyserini is a Python toolkit for reproducible information retrieval research with sparse and
dense representations. It aims to provide effective, reproducible, and easy-to-use first-stage …
dense representations. It aims to provide effective, reproducible, and easy-to-use first-stage …
Learning to tokenize for generative retrieval
As a new paradigm in information retrieval, generative retrieval directly generates a ranked
list of document identifiers (docids) for a given query using generative language models …
list of document identifiers (docids) for a given query using generative language models …
Dense text retrieval based on pretrained language models: A survey
Text retrieval is a long-standing research topic on information seeking, where a system is
required to return relevant information resources to user's queries in natural language. From …
required to return relevant information resources to user's queries in natural language. From …
Conversational information seeking
Conversational information seeking (CIS) is concerned with a sequence of interactions
between one or more users and an information system. Interactions in CIS are primarily …
between one or more users and an information system. Interactions in CIS are primarily …
In-batch negatives for knowledge distillation with tightly-coupled teachers for dense retrieval
We present an efficient training approach to text retrieval with dense representations that
applies knowledge distillation using the ColBERT late-interaction ranking model …
applies knowledge distillation using the ColBERT late-interaction ranking model …