Μελετητής Google

S Gehrmann, E Clark, T Sellam - Journal of Artificial Intelligence Research, 2023 - jair.org

Abstract Evaluation practices in natural language generation (NLG) have many known flaws,
but improved evaluation approaches are rarely widely adopted. This issue has become …

Αποθήκευση Παράθεση Γίνεται αναφορά σε 159 Σχετικά άρθρα Όλες οι 6 εκδοχές Προβολή ως HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

An empirical survey on long document summarization: Datasets, models, and metrics

HY Koh, J Ju, M Liu, S Pan - ACM computing surveys, 2022 - dl.acm.org

Long documents such as academic articles and business reports have been the standard
format to detail out important issues and complicated subjects that require extra attention. An …

Αποθήκευση Παράθεση Γίνεται αναφορά σε 124 Σχετικά άρθρα Όλες οι 7 εκδοχές

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

G-eval: Nlg evaluation using gpt-4 with better human alignment

Y Liu, D Iter, Y Xu, S Wang, R Xu, C Zhu - arxiv preprint arxiv:2303.16634, 2023 - arxiv.org

The quality of texts generated by natural language generation (NLG) systems is hard to
measure automatically. Conventional reference-based metrics, such as BLEU and ROUGE …

Αποθήκευση Παράθεση Γίνεται αναφορά σε 960 Σχετικά άρθρα Όλες οι 4 εκδοχές Προβολή ως HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Towards a unified multi-dimensional evaluator for text generation

M Zhong, Y Liu, D Yin, Y Mao, Y Jiao, P Liu… - arxiv preprint arxiv …, 2022 - arxiv.org

Multi-dimensional evaluation is the dominant paradigm for human evaluation in Natural
Language Generation (NLG), ie, evaluating the generated text from multiple explainable …

Αποθήκευση Παράθεση Γίνεται αναφορά σε 221 Σχετικά άρθρα Όλες οι 6 εκδοχές Προβολή ως HTML

[Free GPT-4]
[DeepSeek]

[PDF] mit.edu

Summeval: Re-evaluating summarization evaluation

AR Fabbri, W Kryściński, B McCann, C **ong… - Transactions of the …, 2021 - direct.mit.edu

The scarcity of comprehensive up-to-date studies on evaluation metrics for text
summarization and the lack of consensus regarding evaluation protocols continue to inhibit …

Αποθήκευση Παράθεση Γίνεται αναφορά σε 715 Σχετικά άρθρα Όλες οι 8 εκδοχές

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Mauve: Measuring the gap between neural text and human text using divergence frontiers

K Pillutla, S Swayamdipta, R Zellers… - Advances in …, 2021 - proceedings.neurips.cc

As major progress is made in open-ended text generation, measuring how close machine-
generated text is to human language remains a critical open problem. We introduce Mauve …

Αποθήκευση Παράθεση Γίνεται αναφορά σε 319 Σχετικά άρθρα Όλες οι 8 εκδοχές Προβολή ως HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Bertscore: Evaluating text generation with bert

T Zhang, V Kishore, F Wu, KQ Weinberger… - arxiv preprint arxiv …, 2019 - arxiv.org

We propose BERTScore, an automatic evaluation metric for text generation. Analogously to
common metrics, BERTScore computes a similarity score for each token in the candidate …

Αποθήκευση Παράθεση Γίνεται αναφορά σε 5856 Σχετικά άρθρα Όλες οι 4 εκδοχές Προβολή ως HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

MoverScore: Text generation evaluating with contextualized embeddings and earth mover distance

W Zhao, M Peyrard, F Liu, Y Gao, CM Meyer… - arxiv preprint arxiv …, 2019 - arxiv.org

A robust evaluation metric has a profound impact on the development of text generation
systems. A desirable metric compares system output against references based on their …

Αποθήκευση Παράθεση Γίνεται αναφορά σε 660 Σχετικά άρθρα Όλες οι 9 εκδοχές Προβολή ως HTML

[Free GPT-4]
[DeepSeek]

[PDF] ieee.org

A survey of the usages of deep learning for natural language processing

DW Otter, JR Medina, JK Kalita - IEEE transactions on neural …, 2020 - ieeexplore.ieee.org

Over the last several years, the field of natural language processing has been propelled
forward by an explosion in the use of deep learning models. This article provides a brief …

Αποθήκευση Παράθεση Γίνεται αναφορά σε 2132 Σχετικά άρθρα Όλες οι 8 εκδοχές

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

A survey of evaluation metrics used for NLG systems

AB Sai, AK Mohankumar, MM Khapra - ACM Computing Surveys (CSUR …, 2022 - dl.acm.org

In the last few years, a large number of automatic evaluation metrics have been proposed for
evaluating Natural Language Generation (NLG) systems. The rapid development and …

Αποθήκευση Παράθεση Γίνεται αναφορά σε 285 Σχετικά άρθρα Όλες οι 4 εκδοχές

Δημιουργία ειδοποίησης

Παράθεση

Σύνθετη αναζήτηση

Αποθηκεύτηκε στη Βιβλιοθήκη μου

Sentence mover’s similarity: Automatic evaluation for multi-sentence texts

Repairing the cracked foundation: A survey of obstacles in evaluation practices for generated text

An empirical survey on long document summarization: Datasets, models, and metrics

G-eval: Nlg evaluation using gpt-4 with better human alignment

Towards a unified multi-dimensional evaluator for text generation

Summeval: Re-evaluating summarization evaluation

Mauve: Measuring the gap between neural text and human text using divergence frontiers

Bertscore: Evaluating text generation with bert

MoverScore: Text generation evaluating with contextualized embeddings and earth mover distance

A survey of the usages of deep learning for natural language processing

A survey of evaluation metrics used for NLG systems