Recent advances in deep learning based dialogue systems: A systematic survey
Dialogue systems are a popular natural language processing (NLP) task as it is promising in
real-life applications. It is also a complicated task since many NLP tasks deserving study are …
real-life applications. It is also a complicated task since many NLP tasks deserving study are …
A survey of evaluation metrics used for NLG systems
In the last few years, a large number of automatic evaluation metrics have been proposed for
evaluating Natural Language Generation (NLG) systems. The rapid development and …
evaluating Natural Language Generation (NLG) systems. The rapid development and …
InstructDial: Improving zero and few-shot generalization in dialogue through instruction tuning
Instruction tuning is an emergent paradigm in NLP wherein natural language instructions
are leveraged with language models to induce zero-shot performance on unseen tasks …
are leveraged with language models to induce zero-shot performance on unseen tasks …
DynaEval: Unifying turn and dialogue level evaluation
A dialogue is essentially a multi-turn interaction among interlocutors. Effective evaluation
metrics should reflect the dynamics of such interaction. Existing automatic metrics are …
metrics should reflect the dynamics of such interaction. Existing automatic metrics are …
A comprehensive assessment of dialog evaluation metrics
Automatic evaluation metrics are a crucial component of dialog systems research. Standard
language evaluation metrics are known to be ineffective for evaluating dialog. As such …
language evaluation metrics are known to be ineffective for evaluating dialog. As such …
Length-controlled alpacaeval: A simple debiasing of automatic evaluators
LLM-based auto-annotators have become a key component of the LLM development
process due to their cost-effectiveness and scalability compared to human-based …
process due to their cost-effectiveness and scalability compared to human-based …
Report from the nsf future directions workshop on automatic evaluation of dialog: Research directions and challenges
This is a report on the NSF Future Directions Workshop on Automatic Evaluation of Dialog.
The workshop explored the current state of the art along with its limitations and suggested …
The workshop explored the current state of the art along with its limitations and suggested …
Simple LLM prompting is state-of-the-art for robust and multilingual dialogue evaluation
Despite significant research effort in the development of automatic dialogue evaluation
metrics, little thought is given to evaluating dialogues other than in English. At the same time …
metrics, little thought is given to evaluating dialogues other than in English. At the same time …
Meet your favorite character: Open-domain chatbot mimicking fictional characters with only a few utterances
In this paper, we consider mimicking fictional characters as a promising direction for building
engaging conversation models. To this end, we present a new practical task where only a …
engaging conversation models. To this end, we present a new practical task where only a …
Evaluating open-domain dialogues in latent space with next sentence prediction and mutual information
The long-standing one-to-many issue of the open-domain dialogues poses significant
challenges for automatic evaluation methods, ie, there may be multiple suitable responses …
challenges for automatic evaluation methods, ie, there may be multiple suitable responses …