Learning from disagreement: A survey

AN Uma, T Fornaciari, D Hovy, S Paun, B Plank… - Journal of Artificial …, 2021 - jair.org
Abstract Many tasks in Natural Language Processing (NLP) and Computer Vision (CV) offer
evidence that humans disagree, from objective tasks such as part-of-speech tagging to more …

Computational models of anaphora

M Poesio, J Yu, S Paun, A Aloraini, P Lu… - Annual Review of …, 2023 - annualreviews.org
Interpreting anaphoric references is a fundamental aspect of our language competence that
has long attracted the attention of computational linguists. The appearance of ever-larger …

Inherent disagreements in human textual inferences

E Pavlick, T Kwiatkowski - Transactions of the Association for …, 2019 - direct.mit.edu
We analyze human's disagreements about the validity of natural language inferences. We
show that, very often, disagreements are not dismissible as annotation “noise”, but rather …

Investigating reasons for disagreement in natural language inference

NJ Jiang, MC Marneffe - Transactions of the Association for …, 2022 - direct.mit.edu
We investigate how disagreement in natural language inference (NLI) annotation arises. We
developed a taxonomy of disagreement sources with 10 categories spanning 3 high-level …

[PDF][PDF] We need to consider disagreement in evaluation

V Basile, M Fell, T Fornaciari, D Hovy, S Paun… - Proceedings of the 1st …, 2021 - iris.unito.it
Where have we been, and where are we going? It is easier to talk about the past than the
future. These days, benchmarks evolve more bottom up (such as papers with code). There …

Quoref: A reading comprehension dataset with questions requiring coreferential reasoning

P Dasigi, NF Liu, A Marasović, NA Smith… - arxiv preprint arxiv …, 2019 - arxiv.org
Machine comprehension of texts longer than a single sentence often requires coreference
resolution. However, most current reading comprehension benchmarks do not contain …

SemEval-2023 task 11: Learning with disagreements (LeWiDi)

E Leonardelli, A Uma, G Abercrombie… - arxiv preprint arxiv …, 2023 - arxiv.org
NLP datasets annotated with human judgments are rife with disagreements between the
judges. This is especially true for tasks depending on subjective judgments such as …

An annotated dataset of coreference in English literature

D Bamman, O Lewke, A Mansoor - arxiv preprint arxiv:1912.01140, 2019 - arxiv.org
We present in this work a new dataset of coreference annotations for works of literature in
English, covering 29,103 mentions in 210,532 tokens from 100 works of fiction. This dataset …

SemEval-2021 task 12: Learning with disagreements

A Uma, T Fornaciari, A Dumitrache… - Proceedings of the …, 2021 - aclanthology.org
Disagreement between coders is ubiquitous in virtually all datasets annotated with human
judgements in both natural language processing and computer vision. However, most …

A case for soft loss functions

A Uma, T Fornaciari, D Hovy, S Paun, B Plank… - Proceedings of the …, 2020 - ojs.aaai.org
Recently, Peterson et al. provided evidence of the benefits of using probabilistic soft labels
generated from crowd annotations for training a computer vision model, showing that using …