[HTML][HTML] Human evaluation of automatically generated text: Current trends and best practice guidelines

C Van der Lee, A Gatt, E Van Miltenburg… - Computer Speech & …, 2021 - Elsevier
Currently, there is little agreement as to how Natural Language Generation (NLG) systems
should be evaluated, with a particularly high degree of variation in the way that human …

Systematic testing of three Language Models reveals low language accuracy, absence of response stability, and a yes-response bias

V Dentella, F Günther, E Leivada - … of the National Academy of Sciences, 2023 - pnas.org
Humans are universally good in providing stable and accurate judgments about what forms
part of their language and what not. Large Language Models (LMs) are claimed to possess …

Linguistic representation and processing of copredication

E Murphy - 2021 - discovery.ucl.ac.uk
This thesis addresses the lexical and psycholinguistic properties of copredication. In
particular, it explores its acceptability, frequency, crosslinguistic and electrophysiological …

The effect of three basic task features on the sensitivity of acceptability judgment tasks

P Marty, E Chemla, J Sprouse - Glossa: a journal of general linguistics …, 2020 - hal.science
Sprouse and Almeida (2017) provide a first systematic investigation of the sensitivity of four
acceptability judgment tasks. In this project, we build on these results by decomposing those …

The application of signal detection theory to acceptability judgments

Y Huang, F Ferreira - Frontiers in Psychology, 2020 - frontiersin.org
Acceptability judgments have been an important tool in language research. By asking a
native speaker whether a linguistic token is acceptable, linguists and psycholinguists can …

Sentence acceptability experiments: What, how, and why

G Goodall - The Cambridge handbook of experimental syntax, 2021 - books.google.com
Sentence acceptability experiments have become increasingly common since Cowart
(1997) first presented a detailed method for carrying them out, but there is still relatively little …

Investigating representations of verb bias in neural language models

RD Hawkins, T Yamakoshi, TL Griffiths… - arxiv preprint arxiv …, 2020 - arxiv.org
Languages typically provide more than one grammatical construction to express certain
types of messages. A speaker's choice of construction is known to depend on multiple …

Approaching gradience in acceptability with the tools of signal detection theory

B Dillon, M Wagers - 2019 - escholarship.org
This chapter outlines a framework for using signal detection theory (SDT) to guide the
design and analysis of acceptability judgment studies in experimental linguistics. It presents …

A multi-level methodology for the automated translation of a coreference resolution dataset: an application to the Italian language

A Minutolo, R Guarasci, E Damiano, G De Pietro… - Neural Computing and …, 2022 - Springer
In the last decade, the demand for readily accessible corpora has touched all areas of
natural language processing, including coreference resolution. However, it is one of the …

Assessing introspective linguistic judgments quantitatively: the case of The Syntax of Chinese

Z Chen, Y Xu, Z **e - Journal of East Asian Linguistics, 2020 - Springer
The informal judgments of the well-formedness of phrases and sentences have long been
used as the primary data source for syntacticians. In recent years, the reliability of data …