محقق Google

Human evaluation of conversations is an open problem: comparing the sensitivity of various methods for evaluating dialogue agents‏

EM Smith, O Hsu, R Qian, S Roller, YL Boureau… - arxiv preprint arxiv …, 2022‏ - arxiv.org‏

At the heart of improving conversational AI is the open problem of how to evaluate
conversations. Issues with automatic metrics are well known (Liu et al., 2016, arxiv …‏

ذخیره ارجاع بیان شده در 64 یافته مقاله‌های مربوط تمام نسخه‌های 8 نسخه HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Don't Forget Your ABC's: Evaluating the State-of-the-Art in Chat-Oriented Dialogue Systems‏

SE Finch, JD Finch, JD Choi - arxiv preprint arxiv:2212.09180, 2022‏ - arxiv.org‏

Despite tremendous advancements in dialogue systems, stable evaluation still requires
human judgments producing notoriously high-variance metrics due to their inherent …‏

ذخیره ارجاع بیان شده در 20 یافته مقاله‌های مربوط تمام نسخه‌های 6 نسخه HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Automatic evaluation and moderation of open-domain dialogue systems‏

C Zhang, J Sedoc, LF D'Haro, R Banchs… - arxiv preprint arxiv …, 2021‏ - arxiv.org‏

The development of Open-Domain Dialogue Systems (ODS) is a trending topic due to the
large number of research challenges, large societal and business impact, and advances in …‏

ذخیره ارجاع بیان شده در 30 یافته مقاله‌های مربوط تمام نسخه‌های 3 نسخه HTML

[Free GPT-4]
[DeepSeek]

[PDF] ieee.org

PoE: A panel of experts for generalized automatic dialogue assessment‏

C Zhang, LF D'Haro, Q Zhang… - … /ACM Transactions on …, 2023‏ - ieeexplore.ieee.org‏

Chatbots are expected to be knowledgeable across multiple domains, eg for daily chit-chat,
exchange of information, and grounding in emotional situations. To effectively measure the …‏

ذخیره ارجاع بیان شده در 9 یافته مقاله‌های مربوط تمام نسخه‌های 4

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Psychological metrics for dialog system evaluation‏

S Giorgi, S Havaldar, F Ahmed, Z Akhtar… - arxiv preprint arxiv …, 2023‏ - arxiv.org‏

We present metrics for evaluating dialog systems through a psychologically-grounded"
human" lens in which conversational agents express a diversity of both states (eg, emotion) …‏

ذخیره ارجاع بیان شده در 2 یافته مقاله‌های مربوط تمام نسخه‌های 2 نسخه HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Exploring the Impact of Human Evaluator Group on Chat-Oriented Dialogue Evaluation‏

SE Finch, JD Finch, JD Choi - arxiv preprint arxiv:2309.07998, 2023‏ - arxiv.org‏

Human evaluation has been widely accepted as the standard for evaluating chat-oriented
dialogue systems. However, there is a significant variation in previous work regarding who …‏

ذخیره ارجاع مقاله‌های مربوط تمام نسخه‌های 5 نسخه HTML

ایجاد هشدار

ارجاع

جستجوی پیشرفته

در «کتابخانه من» ذخیره شد

An evaluation protocol for generative conversational systems

Human evaluation of conversations is an open problem: comparing the sensitivity of various methods for evaluating dialogue agents‏

Don't Forget Your ABC's: Evaluating the State-of-the-Art in Chat-Oriented Dialogue Systems‏

Automatic evaluation and moderation of open-domain dialogue systems‏

PoE: A panel of experts for generalized automatic dialogue assessment‏

Psychological metrics for dialog system evaluation‏

Exploring the Impact of Human Evaluator Group on Chat-Oriented Dialogue Evaluation‏