- Academic Search

J Ni, T Young, V Pandelea, F Xue… - Artificial intelligence review, 2023 - Springer

Dialogue systems are a popular natural language processing (NLP) task as it is promising in
real-life applications. It is also a complicated task since many NLP tasks deserving study are …

Save Cite Cited by 301 Related articles All 15 versions Free GPT-4

[Free GPT-4]

[PDF] acm.org Full View

A survey of evaluation metrics used for NLG systems

AB Sai, AK Mohankumar, MM Khapra - ACM Computing Surveys (CSUR …, 2022 - dl.acm.org

In the last few years, a large number of automatic evaluation metrics have been proposed for
evaluating Natural Language Generation (NLG) systems. The rapid development and …

Save Cite Cited by 285 Related articles All 4 versions Free GPT-4

[Free GPT-4]

[PDF] arxiv.org

InstructDial: Improving zero and few-shot generalization in dialogue through instruction tuning

P Gupta, C Jiao, YT Yeh, S Mehri, M Eskenazi… - arxiv preprint arxiv …, 2022 - arxiv.org

Instruction tuning is an emergent paradigm in NLP wherein natural language instructions
are leveraged with language models to induce zero-shot performance on unseen tasks …

Save Cite Cited by 77 Related articles All 5 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

DynaEval: Unifying turn and dialogue level evaluation

C Zhang, Y Chen, LF D'Haro, Y Zhang… - arxiv preprint arxiv …, 2021 - arxiv.org

A dialogue is essentially a multi-turn interaction among interlocutors. Effective evaluation
metrics should reflect the dynamics of such interaction. Existing automatic metrics are …

Save Cite Cited by 70 Related articles All 6 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

A comprehensive assessment of dialog evaluation metrics

YT Yeh, M Eskenazi, S Mehri - arxiv preprint arxiv:2106.03706, 2021 - arxiv.org

Automatic evaluation metrics are a crucial component of dialog systems research. Standard
language evaluation metrics are known to be ineffective for evaluating dialog. As such …

Save Cite Cited by 105 Related articles All 6 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] openreview.net

Length-controlled alpacaeval: A simple debiasing of automatic evaluators

Y Dubois, P Liang, T Hashimoto - First Conference on Language …, 2024 - openreview.net

LLM-based auto-annotators have become a key component of the LLM development
process due to their cost-effectiveness and scalability compared to human-based …

Save Cite Cited by 16 Related articles View as HTML

[Free GPT-4]

[PDF] arxiv.org

Report from the nsf future directions workshop on automatic evaluation of dialog: Research directions and challenges

S Mehri, J Choi, LF D'Haro, J Deriu, M Eskenazi… - arxiv preprint arxiv …, 2022 - arxiv.org

This is a report on the NSF Future Directions Workshop on Automatic Evaluation of Dialog.
The workshop explored the current state of the art along with its limitations and suggested …

Save Cite Cited by 31 Related articles All 3 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Simple LLM prompting is state-of-the-art for robust and multilingual dialogue evaluation

J Mendonça, P Pereira, H Moniz, JP Carvalho… - arxiv preprint arxiv …, 2023 - arxiv.org

Despite significant research effort in the development of automatic dialogue evaluation
metrics, little thought is given to evaluating dialogues other than in English. At the same time …

Save Cite Cited by 16 Related articles All 3 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Meet your favorite character: Open-domain chatbot mimicking fictional characters with only a few utterances

S Han, B Kim, JY Yoo, S Seo, S Kim, E Erdenee… - arxiv preprint arxiv …, 2022 - arxiv.org

In this paper, we consider mimicking fictional characters as a promising direction for building
engaging conversation models. To this end, we present a new practical task where only a …

Save Cite Cited by 38 Related articles All 9 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Evaluating open-domain dialogues in latent space with next sentence prediction and mutual information

K Zhao, B Yang, C Lin, W Rong, A Villavicencio… - arxiv preprint arxiv …, 2023 - arxiv.org

The long-standing one-to-many issue of the open-domain dialogues poses significant
challenges for automatic evaluation methods, ie, there may be multiple suitable responses …

Save Cite Cited by 16 Related articles All 7 versions Free GPT-4 View as HTML

Create alert

Cite

Advanced search

Saved to My library

Learning an unreferenced metric for online dialogue evaluation

Recent advances in deep learning based dialogue systems: A systematic survey

A survey of evaluation metrics used for NLG systems

InstructDial: Improving zero and few-shot generalization in dialogue through instruction tuning

DynaEval: Unifying turn and dialogue level evaluation

A comprehensive assessment of dialog evaluation metrics

Length-controlled alpacaeval: A simple debiasing of automatic evaluators

Report from the nsf future directions workshop on automatic evaluation of dialog: Research directions and challenges

Simple LLM prompting is state-of-the-art for robust and multilingual dialogue evaluation

Meet your favorite character: Open-domain chatbot mimicking fictional characters with only a few utterances

Evaluating open-domain dialogues in latent space with next sentence prediction and mutual information