Repairing the cracked foundation: A survey of obstacles in evaluation practices for generated text

S Gehrmann, E Clark, T Sellam - Journal of Artificial Intelligence Research, 2023 - jair.org
Abstract Evaluation practices in natural language generation (NLG) have many known flaws,
but improved evaluation approaches are rarely widely adopted. This issue has become …

A review of signals used in sleep analysis

A Roebuck, V Monasterio, E Gederi… - Physiological …, 2013 - iopscience.iop.org
This article presents a review of signals used for measuring physiology and activity during
sleep and techniques for extracting information from these signals. We examine both clinical …

The data-production dispositif

M Miceli, J Posada - Proceedings of the ACM on human-computer …, 2022 - dl.acm.org
Machine learning (ML) depends on data to train and verify models. Very often, organizations
outsource processes related to data work (ie, generating and annotating data and …

Studying up machine learning data: Why talk about bias when we mean power?

M Miceli, J Posada, T Yang - Proceedings of the ACM on Human …, 2022 - dl.acm.org
Research in machine learning (ML) has argued that models trained on incomplete or biased
datasets can lead to discriminatory outputs. In this commentary, we propose moving the …

Inter-coder agreement for computational linguistics

R Artstein, M Poesio - Computational linguistics, 2008 - direct.mit.edu
This article is a survey of methods for measuring agreement among corpus annotators. It
exposes the mathematics and underlying assumptions of agreement coefficients, covering …

AnCora-CO: Coreferentially annotated corpora for Spanish and Catalan

M Recasens, MA Martí - Language resources and evaluation, 2010 - Springer
This article describes the enrichment of the AnCora corpora of Spanish and Catalan (400 k
each) with coreference links between pronouns (including elliptical subjects and clitics), full …

Underreporting of errors in NLG output, and what to do about it

E Van Miltenburg, MA Clinciu, O Dušek… - arxiv preprint arxiv …, 2021 - arxiv.org
We observe a severe under-reporting of the different kinds of errors that Natural Language
Generation systems make. This is a problem, because mistakes are an important indicator of …

Reliability measurement without limits

D Reidsma, J Carletta - Computational Linguistics, 2008 - direct.mit.edu
In computational linguistics, a reliability measurement of 0.8 on some statistic such as κ is
widely thought to guarantee that hand-coded data is fit for purpose, with 0.67 to 0.8 …

Ransomware: How attacker's effort, victim characteristics and context influence ransom requested, payment and financial loss

T Meurs, M Junger, E Tews… - 2022 APWG symposium …, 2022 - ieeexplore.ieee.org
In recent years, ransomware attacks have led to disastrous consequences for victims, not
just due to the payment ransom amount but also due to the recovery costs associated with …