Holistic evaluation of language models

P Liang, R Bommasani, T Lee, D Tsipras… - arxiv preprint arxiv …, 2022 - arxiv.org
Language models (LMs) are becoming the foundation for almost all major language
technologies, but their capabilities, limitations, and risks are not well understood. We present …

Biomedical question answering: a survey of approaches and challenges

Q **, Z Yuan, G **ong, Q Yu, H Ying, C Tan… - ACM Computing …, 2022 - dl.acm.org
Automatic Question Answering (QA) has been successfully applied in various domains such
as search engines and chatbots. Biomedical QA (BQA), as an emerging QA task, enables …

Semantic models for the first-stage retrieval: A comprehensive review

J Guo, Y Cai, Y Fan, F Sun, R Zhang… - ACM Transactions on …, 2022 - dl.acm.org
Multi-stage ranking pipelines have been a practical solution in modern search systems,
where the first-stage retrieval is to return a subset of candidate documents and latter stages …

Large language models for information retrieval: A survey

Y Zhu, H Yuan, S Wang, J Liu, W Liu, C Deng… - arxiv preprint arxiv …, 2023 - arxiv.org
As a primary means of information acquisition, information retrieval (IR) systems, such as
search engines, have integrated themselves into our daily lives. These systems also serve …

Text and code embeddings by contrastive pre-training

A Neelakantan, T Xu, R Puri, A Radford, JM Han… - arxiv preprint arxiv …, 2022 - arxiv.org
Text embeddings are useful features in many applications such as semantic search and
computing text similarity. Previous work typically trains models customized for different use …

Pyserini: A Python toolkit for reproducible information retrieval research with sparse and dense representations

J Lin, X Ma, SC Lin, JH Yang, R Pradeep… - Proceedings of the 44th …, 2021 - dl.acm.org
Pyserini is a Python toolkit for reproducible information retrieval research with sparse and
dense representations. It aims to provide effective, reproducible, and easy-to-use first-stage …

Learning to tokenize for generative retrieval

W Sun, L Yan, Z Chen, S Wang, H Zhu… - Advances in …, 2023 - proceedings.neurips.cc
As a new paradigm in information retrieval, generative retrieval directly generates a ranked
list of document identifiers (docids) for a given query using generative language models …

Dense text retrieval based on pretrained language models: A survey

WX Zhao, J Liu, R Ren, JR Wen - ACM Transactions on Information …, 2024 - dl.acm.org
Text retrieval is a long-standing research topic on information seeking, where a system is
required to return relevant information resources to user's queries in natural language. From …

Conversational information seeking

H Zamani, JR Trippas, J Dalton… - … and Trends® in …, 2023 - nowpublishers.com
Conversational information seeking (CIS) is concerned with a sequence of interactions
between one or more users and an information system. Interactions in CIS are primarily …

In-batch negatives for knowledge distillation with tightly-coupled teachers for dense retrieval

SC Lin, JH Yang, J Lin - Proceedings of the 6th Workshop on …, 2021 - aclanthology.org
We present an efficient training approach to text retrieval with dense representations that
applies knowledge distillation using the ColBERT late-interaction ranking model …