- Academic Search

J Ni, T Young, V Pandelea, F Xue… - Artificial intelligence review, 2023 - Springer

Dialogue systems are a popular natural language processing (NLP) task as it is promising in
real-life applications. It is also a complicated task since many NLP tasks deserving study are …

Uložit Citovat Počet citací tohoto článku: 301 Související články Všechny verze (počet: 15)

[Free GPT-4]

[PDF] jair.org Full View

Repairing the cracked foundation: A survey of obstacles in evaluation practices for generated text

S Gehrmann, E Clark, T Sellam - Journal of Artificial Intelligence Research, 2023 - jair.org

Abstract Evaluation practices in natural language generation (NLG) have many known flaws,
but improved evaluation approaches are rarely widely adopted. This issue has become …

Uložit Citovat Počet citací tohoto článku: 158 Související články Všechny verze (počet: 6) Zobrazit jako HTML

[Free GPT-4]

[PDF] arxiv.org

G-eval: Nlg evaluation using gpt-4 with better human alignment

Y Liu, D Iter, Y Xu, S Wang, R Xu, C Zhu - arxiv preprint arxiv:2303.16634, 2023 - arxiv.org

The quality of texts generated by natural language generation (NLG) systems is hard to
measure automatically. Conventional reference-based metrics, such as BLEU and ROUGE …

Uložit Citovat Počet citací tohoto článku: 948 Související články Všechny verze (počet: 4) Zobrazit jako HTML

[Free GPT-4]

[PDF] openreview.net

Chateval: Towards better llm-based evaluators through multi-agent debate

CM Chan, W Chen, Y Su, J Yu, W Xue, S Zhang… - arxiv preprint arxiv …, 2023 - arxiv.org

Text evaluation has historically posed significant challenges, often demanding substantial
labor and time cost. With the emergence of large language models (LLMs), researchers …

Uložit Citovat Počet citací tohoto článku: 336 Související články Všechny verze (počet: 3) Zobrazit jako HTML

[Free GPT-4]

[PDF] arxiv.org

Towards a unified multi-dimensional evaluator for text generation

M Zhong, Y Liu, D Yin, Y Mao, Y Jiao, P Liu… - arxiv preprint arxiv …, 2022 - arxiv.org

Multi-dimensional evaluation is the dominant paradigm for human evaluation in Natural
Language Generation (NLG), ie, evaluating the generated text from multiple explainable …

Uložit Citovat Počet citací tohoto článku: 218 Související články Všechny verze (počet: 6) Zobrazit jako HTML

[Free GPT-4]

[PDF] arxiv.org

Multiwoz--a large-scale multi-domain wizard-of-oz dataset for task-oriented dialogue modelling

P Budzianowski, TH Wen, BH Tseng… - arxiv preprint arxiv …, 2018 - arxiv.org

Even though machine learning has become the major scene in dialogue research
community, the real breakthrough has been blocked by the scale of data available. To …

Uložit Citovat Počet citací tohoto článku: 1516 Související články Všechny verze (počet: 6) Zobrazit jako HTML

[Free GPT-4]

[PDF] arxiv.org

A survey of evaluation metrics used for NLG systems

AB Sai, AK Mohankumar, MM Khapra - ACM Computing Surveys (CSUR …, 2022 - dl.acm.org

In the last few years, a large number of automatic evaluation metrics have been proposed for
evaluating Natural Language Generation (NLG) systems. The rapid development and …

Uložit Citovat Počet citací tohoto článku: 283 Související články Všechny verze (počet: 4)

[Free GPT-4]

[PDF] arxiv.org

A survey on dialogue systems: Recent advances and new frontiers

H Chen, X Liu, D Yin, J Tang - Acm Sigkdd Explorations Newsletter, 2017 - dl.acm.org

Dialogue systems have attracted more and more attention. Recent advances on dialogue
systems are overwhelmingly contributed by deep learning techniques, which have been …

Uložit Citovat Počet citací tohoto článku: 953 Související články Všechny verze (počet: 6)

[Free GPT-4]

[PDF] arxiv.org

Unleashing the potential of prompt engineering in Large Language Models: a comprehensive review

B Chen, Z Zhang, N Langrené, S Zhu - arxiv preprint arxiv:2310.14735, 2023 - arxiv.org

This paper delves into the pivotal role of prompt engineering in unleashing the capabilities
of Large Language Models (LLMs). Prompt engineering is the process of structuring input …

Uložit Citovat Počet citací tohoto článku: 233 Související články Všechny verze (počet: 3) Zobrazit jako HTML

[Free GPT-4]

[PDF] jair.org

Survey of the state of the art in natural language generation: Core tasks, applications and evaluation

A Gatt, E Krahmer - Journal of Artificial Intelligence Research, 2018 - jair.org

This paper surveys the current state of the art in Natural Language Generation (NLG),
defined as the task of generating text or speech from non-linguistic input. A survey of NLG is …

Uložit Citovat Počet citací tohoto článku: 1149 Související články Všechny verze (počet: 15) Zobrazit jako HTML

Vytvořit upozornění

Citovat

Rozšířené vyhledávání

Uloženo do Mojí knihovny

Evaluating evaluation methods for generation in the presence of variation

Recent advances in deep learning based dialogue systems: A systematic survey

Repairing the cracked foundation: A survey of obstacles in evaluation practices for generated text

G-eval: Nlg evaluation using gpt-4 with better human alignment

Chateval: Towards better llm-based evaluators through multi-agent debate

Towards a unified multi-dimensional evaluator for text generation

Multiwoz--a large-scale multi-domain wizard-of-oz dataset for task-oriented dialogue modelling

A survey of evaluation metrics used for NLG systems

A survey on dialogue systems: Recent advances and new frontiers

Unleashing the potential of prompt engineering in Large Language Models: a comprehensive review

Survey of the state of the art in natural language generation: Core tasks, applications and evaluation