- Academic Search

S Gehrmann, E Clark, T Sellam - Journal of Artificial Intelligence Research, 2023 - jair.org

Abstract Evaluation practices in natural language generation (NLG) have many known flaws,
but improved evaluation approaches are rarely widely adopted. This issue has become …

Tallenna Viittaa Viittausten määrä 156 Aiheeseen liittyviä artikkeleita Kaikki 6 versiota HTML-versio

[Free GPT-4]
[DeepSeek]

[PDF] nih.gov

An integrative survey on mental health conversational agents to bridge computer science and medical perspectives

YM Cho, S Rai, L Ungar, J Sedoc… - Proceedings of the …, 2023 - pmc.ncbi.nlm.nih.gov

Mental health conversational agents (aka chatbots) are widely studied for their potential to
offer accessible support to those experiencing mental health challenges. Previous surveys …

Tallenna Viittaa Viittausten määrä 16 Aiheeseen liittyviä artikkeleita Kaikki 8 versiota

[Free GPT-4]
[DeepSeek]

[PDF] pubpub.org

[PDF][PDF] Ai transparency in the age of llms: A human-centered research roadmap

QV Liao, JW Vaughan - ar** Norwegian salmon: An inventory of pitfalls in fairness benchmark datasets

SL Blodgett, G Lopez, A Olteanu, R Sim… - Proceedings of the …, 2021 - aclanthology.org

Auditing NLP systems for computational harms like surfacing stereotypes is an elusive goal.
Several recent efforts have focused on benchmark datasets consisting of pairs of contrastive …

Tallenna Viittaa Viittausten määrä 319 Aiheeseen liittyviä artikkeleita Kaikki 4 versiota HTML-versio

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

" I'm sorry to hear that": Finding New Biases in Language Models with a Holistic Descriptor Dataset

EM Smith, M Hall, M Kambadur, E Presani… - arxiv preprint arxiv …, 2022 - arxiv.org

As language models grow in popularity, it becomes increasingly important to clearly
measure all possible markers of demographic identity in order to avoid perpetuating existing …

Tallenna Viittaa Viittausten määrä 148 Aiheeseen liittyviä artikkeleita Kaikki 3 versiota HTML-versio

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Evaluation of text generation: A survey

A Celikyilmaz, E Clark, J Gao - arxiv preprint arxiv:2006.14799, 2020 - arxiv.org

The paper surveys evaluation methods of natural language generation (NLG) systems that
have been developed in the last few years. We group NLG evaluation methods into three …

Tallenna Viittaa Viittausten määrä 429 Aiheeseen liittyviä artikkeleita Kaikki 2 versiota HTML-versio

[Free GPT-4]
[DeepSeek]

[PDF] mit.edu

Measuring attribution in natural language generation models

H Rashkin, V Nikolaev, M Lamm, L Aroyo… - Computational …, 2023 - direct.mit.edu

Large neural models have brought a new challenge to natural language generation (NLG): It
has become imperative to ensure the safety and reliability of the output of models that …

Tallenna Viittaa Viittausten määrä 142 Aiheeseen liittyviä artikkeleita Kaikki 8 versiota

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Is GPT-3 text indistinguishable from human text? scarecrow: A framework for scrutinizing machine text

Y Dou, M Forbes, R Koncel-Kedziorski… - arxiv preprint arxiv …, 2021 - arxiv.org

Modern neural language models can produce remarkably fluent and grammatical text. So
much, in fact, that recent work by Clark et al.(2021) has reported that conventional …

Tallenna Viittaa Viittausten määrä 172 Aiheeseen liittyviä artikkeleita Kaikki 5 versiota HTML-versio

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

The perils of using Mechanical Turk to evaluate open-ended text generation

M Karpinska, N Akoury, M Iyyer - arxiv preprint arxiv:2109.06835, 2021 - arxiv.org

Recent text generation research has increasingly focused on open-ended domains such as
story and poetry generation. Because models built for such tasks are difficult to evaluate …

Tallenna Viittaa Viittausten määrä 128 Aiheeseen liittyviä artikkeleita Kaikki 6 versiota HTML-versio

Luo ilmoitus

Viittaa

Tarkennettu haku

Tallennettu omaan kirjastoon

Twenty years of confusion in human evaluation: NLG needs evaluation sheets and standardised...

Repairing the cracked foundation: A survey of obstacles in evaluation practices for generated text

An integrative survey on mental health conversational agents to bridge computer science and medical perspectives

[PDF][PDF] Ai transparency in the age of llms: A human-centered research roadmap

" I'm sorry to hear that": Finding New Biases in Language Models with a Holistic Descriptor Dataset

Evaluation of text generation: A survey

Measuring attribution in natural language generation models

Is GPT-3 text indistinguishable from human text? scarecrow: A framework for scrutinizing machine text

The perils of using Mechanical Turk to evaluate open-ended text generation