Report from the nsf future directions workshop on automatic evaluation of dialog: Research directions and challenges

S Mehri, J Choi, LF D'Haro, J Deriu, M Eskenazi… - arxiv preprint arxiv …, 2022 - arxiv.org
This is a report on the NSF Future Directions Workshop on Automatic Evaluation of Dialog.
The workshop explored the current state of the art along with its limitations and suggested …

BiSyn-GAT+: Bi-syntax aware graph attention network for aspect-based sentiment analysis

S Liang, W Wei, XL Mao, F Wang, Z He - arxiv preprint arxiv:2204.03117, 2022 - arxiv.org
Aspect-based sentiment analysis (ABSA) is a fine-grained sentiment analysis task that aims
to align aspects and corresponding sentiments for aspect-specific sentiment polarity …

DynaEval: Unifying turn and dialogue level evaluation

C Zhang, Y Chen, LF D'Haro, Y Zhang… - arxiv preprint arxiv …, 2021 - arxiv.org
A dialogue is essentially a multi-turn interaction among interlocutors. Effective evaluation
metrics should reflect the dynamics of such interaction. Existing automatic metrics are …

A comprehensive assessment of dialog evaluation metrics

YT Yeh, M Eskenazi, S Mehri - arxiv preprint arxiv:2106.03706, 2021 - arxiv.org
Automatic evaluation metrics are a crucial component of dialog systems research. Standard
language evaluation metrics are known to be ineffective for evaluating dialog. As such …

Deep learning for dialogue systems: Chit-chat and beyond

R Yan, J Li, Z Yu - Foundations and Trends® in Information …, 2022 - nowpublishers.com
With the rapid progress of deep neural models and the explosion of available data
resources, dialogue systems that supports extensive topics and chit-chat conversations are …

A novel adaptive marker segmentation graph convolutional network for aspect-level sentiment analysis

P Wang, L Tao, M Tang, M Zhao, L Wang, Y Xu… - Knowledge-Based …, 2023 - Elsevier
Aspect-level sentiment analysis is a fine-grained sentiment classification task that aims to
identify the sentiment polarity of specific aspects in online reviews. Attention mechanisms …

Approximating online human evaluation of social chatbots with prompting

E Svikhnushina, P Pu - arxiv preprint arxiv:2304.05253, 2023 - arxiv.org
As conversational models become increasingly available to the general public, users are
engaging with this technology in social interactions. Such unprecedented interaction …

Deconstruct to reconstruct a configurable evaluation metric for open-domain dialogue systems

V Phy, Y Zhao, A Aizawa - arxiv preprint arxiv:2011.00483, 2020 - arxiv.org
Many automatic evaluation metrics have been proposed to score the overall quality of a
response in open-domain dialogue. Generally, the overall quality is comprised of various …

Which prompts make the difference? Data prioritization for efficient human LLM evaluation

M Boubdir, E Kim, B Ermis, M Fadaee… - arxiv preprint arxiv …, 2023 - arxiv.org
Human evaluation is increasingly critical for assessing large language models, capturing
linguistic nuances, and reflecting user preferences more accurately than traditional …

xDial-eval: A multilingual open-domain dialogue evaluation benchmark

C Zhang, LF D'Haro, C Tang, K Shi, G Tang… - arxiv preprint arxiv …, 2023 - arxiv.org
Recent advancements in reference-free learned metrics for open-domain dialogue
evaluation have been driven by the progress in pre-trained language models and the …