Recent advances in deep learning based dialogue systems: A systematic survey

J Ni, T Young, V Pandelea, F Xue… - Artificial intelligence review, 2023 - Springer
Dialogue systems are a popular natural language processing (NLP) task as it is promising in
real-life applications. It is also a complicated task since many NLP tasks deserving study are …

A survey of evaluation metrics used for NLG systems

AB Sai, AK Mohankumar, MM Khapra - ACM Computing Surveys (CSUR …, 2022 - dl.acm.org
In the last few years, a large number of automatic evaluation metrics have been proposed for
evaluating Natural Language Generation (NLG) systems. The rapid development and …

InstructDial: Improving zero and few-shot generalization in dialogue through instruction tuning

P Gupta, C Jiao, YT Yeh, S Mehri, M Eskenazi… - arxiv preprint arxiv …, 2022 - arxiv.org
Instruction tuning is an emergent paradigm in NLP wherein natural language instructions
are leveraged with language models to induce zero-shot performance on unseen tasks …

DynaEval: Unifying turn and dialogue level evaluation

C Zhang, Y Chen, LF D'Haro, Y Zhang… - arxiv preprint arxiv …, 2021 - arxiv.org
A dialogue is essentially a multi-turn interaction among interlocutors. Effective evaluation
metrics should reflect the dynamics of such interaction. Existing automatic metrics are …

A comprehensive assessment of dialog evaluation metrics

YT Yeh, M Eskenazi, S Mehri - arxiv preprint arxiv:2106.03706, 2021 - arxiv.org
Automatic evaluation metrics are a crucial component of dialog systems research. Standard
language evaluation metrics are known to be ineffective for evaluating dialog. As such …

Length-controlled alpacaeval: A simple debiasing of automatic evaluators

Y Dubois, P Liang, T Hashimoto - First Conference on Language …, 2024 - openreview.net
LLM-based auto-annotators have become a key component of the LLM development
process due to their cost-effectiveness and scalability compared to human-based …

Report from the nsf future directions workshop on automatic evaluation of dialog: Research directions and challenges

S Mehri, J Choi, LF D'Haro, J Deriu, M Eskenazi… - arxiv preprint arxiv …, 2022 - arxiv.org
This is a report on the NSF Future Directions Workshop on Automatic Evaluation of Dialog.
The workshop explored the current state of the art along with its limitations and suggested …

Simple LLM prompting is state-of-the-art for robust and multilingual dialogue evaluation

J Mendonça, P Pereira, H Moniz, JP Carvalho… - arxiv preprint arxiv …, 2023 - arxiv.org
Despite significant research effort in the development of automatic dialogue evaluation
metrics, little thought is given to evaluating dialogues other than in English. At the same time …

Meet your favorite character: Open-domain chatbot mimicking fictional characters with only a few utterances

S Han, B Kim, JY Yoo, S Seo, S Kim, E Erdenee… - arxiv preprint arxiv …, 2022 - arxiv.org
In this paper, we consider mimicking fictional characters as a promising direction for building
engaging conversation models. To this end, we present a new practical task where only a …

Evaluating open-domain dialogues in latent space with next sentence prediction and mutual information

K Zhao, B Yang, C Lin, W Rong, A Villavicencio… - arxiv preprint arxiv …, 2023 - arxiv.org
The long-standing one-to-many issue of the open-domain dialogues poses significant
challenges for automatic evaluation methods, ie, there may be multiple suitable responses …