- Academic Search

S Gehrmann, E Clark, T Sellam - Journal of Artificial Intelligence Research, 2023 - jair.org

Abstract Evaluation practices in natural language generation (NLG) have many known flaws,
but improved evaluation approaches are rarely widely adopted. This issue has become …

Enregistrer Citer Cité 158 fois Autres articles Les 6 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] arxiv.org

A survey of evaluation metrics used for NLG systems

AB Sai, AK Mohankumar, MM Khapra - ACM Computing Surveys (CSUR …, 2022 - dl.acm.org

In the last few years, a large number of automatic evaluation metrics have been proposed for
evaluating Natural Language Generation (NLG) systems. The rapid development and …

Enregistrer Citer Cité 280 fois Autres articles Les 4 versions Free GPT-4

[Free GPT-4]

[PDF] neurips.cc

Paraphrasing evades detectors of ai-generated text, but retrieval is an effective defense

K Krishna, Y Song, M Karpinska… - Advances in Neural …, 2024 - proceedings.neurips.cc

The rise in malicious usage of large language models, such as fake content creation and
academic plagiarism, has motivated the development of approaches that identify AI …

Enregistrer Citer Cité 261 fois Autres articles Les 5 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] neurips.cc

Confident adaptive language modeling

T Schuster, A Fisch, J Gupta… - Advances in …, 2022 - proceedings.neurips.cc

Recent advances in Transformer-based large language models (LLMs) have led to
significant performance improvements across many tasks. These gains come with a drastic …

Enregistrer Citer Cité 188 fois Autres articles Les 8 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] arxiv.org

SimCLS: A simple framework for contrastive learning of abstractive summarization

Y Liu, P Liu - arxiv preprint arxiv:2106.01890, 2021 - arxiv.org

In this paper, we present a conceptually simple while empirically powerful framework for
abstractive summarization, SimCLS, which can bridge the gap between the learning …

Enregistrer Citer Cité 282 fois Autres articles Les 5 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] arxiv.org

Bertscore: Evaluating text generation with bert

T Zhang, V Kishore, F Wu, KQ Weinberger… - arxiv preprint arxiv …, 2019 - arxiv.org

We propose BERTScore, an automatic evaluation metric for text generation. Analogously to
common metrics, BERTScore computes a similarity score for each token in the candidate …

Enregistrer Citer Cité 5793 fois Autres articles Les 4 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] arxiv.org

Reformulating unsupervised style transfer as paraphrase generation

K Krishna, J Wieting, M Iyyer - arxiv preprint arxiv:2010.05700, 2020 - arxiv.org

Modern NLP defines the task of style transfer as modifying the style of a given sentence
without appreciably changing its semantics, which implies that the outputs of style transfer …

Enregistrer Citer Cité 273 fois Autres articles Les 4 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] arxiv.org

You only prompt once: On the capabilities of prompt learning on large language models to tackle toxic content

X He, S Zannettou, Y Shen… - 2024 IEEE Symposium on …, 2024 - ieeexplore.ieee.org

The spread of toxic content online is an important problem that has adverse effects on user
experience online and in our society at large. Motivated by the importance and impact of the …

Enregistrer Citer Cité 45 fois Autres articles Les 7 versions Free GPT-4

[Free GPT-4]

[PDF] acm.org

Semantic similarity metrics for evaluating source code summarization

S Haque, Z Eberhart, A Bansal… - Proceedings of the 30th …, 2022 - dl.acm.org

Source code summarization involves creating brief descriptions of source code in natural
language. These descriptions are a key component of software documentation such as …

Enregistrer Citer Cité 153 fois Autres articles Les 7 versions Free GPT-4

[Free GPT-4]

[PDF] arxiv.org

LongEval: Guidelines for human evaluation of faithfulness in long-form summarization

K Krishna, E Bransom, B Kuehl, M Iyyer… - arxiv preprint arxiv …, 2023 - arxiv.org

While human evaluation remains best practice for accurately judging the faithfulness of
automatically-generated summaries, few solutions exist to address the increased difficulty …

Enregistrer Citer Cité 70 fois Autres articles Les 8 versions Free GPT-4 Version HTML

Créer l'alerte

Citer

Recherche avancée

Enregistré dans Ma bibliothèque

Beyond BLEU: training neural machine translation with semantic similarity

Repairing the cracked foundation: A survey of obstacles in evaluation practices for generated text

A survey of evaluation metrics used for NLG systems

Paraphrasing evades detectors of ai-generated text, but retrieval is an effective defense

Confident adaptive language modeling

SimCLS: A simple framework for contrastive learning of abstractive summarization

Bertscore: Evaluating text generation with bert

Reformulating unsupervised style transfer as paraphrase generation

You only prompt once: On the capabilities of prompt learning on large language models to tackle toxic content

Semantic similarity metrics for evaluating source code summarization

LongEval: Guidelines for human evaluation of faithfulness in long-form summarization