Learning from disagreement: A survey
Abstract Many tasks in Natural Language Processing (NLP) and Computer Vision (CV) offer
evidence that humans disagree, from objective tasks such as part-of-speech tagging to more …
evidence that humans disagree, from objective tasks such as part-of-speech tagging to more …
Computational models of anaphora
Interpreting anaphoric references is a fundamental aspect of our language competence that
has long attracted the attention of computational linguists. The appearance of ever-larger …
has long attracted the attention of computational linguists. The appearance of ever-larger …
Inherent disagreements in human textual inferences
We analyze human's disagreements about the validity of natural language inferences. We
show that, very often, disagreements are not dismissible as annotation “noise”, but rather …
show that, very often, disagreements are not dismissible as annotation “noise”, but rather …
Investigating reasons for disagreement in natural language inference
We investigate how disagreement in natural language inference (NLI) annotation arises. We
developed a taxonomy of disagreement sources with 10 categories spanning 3 high-level …
developed a taxonomy of disagreement sources with 10 categories spanning 3 high-level …
[PDF][PDF] We need to consider disagreement in evaluation
Where have we been, and where are we going? It is easier to talk about the past than the
future. These days, benchmarks evolve more bottom up (such as papers with code). There …
future. These days, benchmarks evolve more bottom up (such as papers with code). There …
Quoref: A reading comprehension dataset with questions requiring coreferential reasoning
Machine comprehension of texts longer than a single sentence often requires coreference
resolution. However, most current reading comprehension benchmarks do not contain …
resolution. However, most current reading comprehension benchmarks do not contain …
SemEval-2023 task 11: Learning with disagreements (LeWiDi)
NLP datasets annotated with human judgments are rife with disagreements between the
judges. This is especially true for tasks depending on subjective judgments such as …
judges. This is especially true for tasks depending on subjective judgments such as …
An annotated dataset of coreference in English literature
D Bamman, O Lewke, A Mansoor - arxiv preprint arxiv:1912.01140, 2019 - arxiv.org
We present in this work a new dataset of coreference annotations for works of literature in
English, covering 29,103 mentions in 210,532 tokens from 100 works of fiction. This dataset …
English, covering 29,103 mentions in 210,532 tokens from 100 works of fiction. This dataset …
SemEval-2021 task 12: Learning with disagreements
Disagreement between coders is ubiquitous in virtually all datasets annotated with human
judgements in both natural language processing and computer vision. However, most …
judgements in both natural language processing and computer vision. However, most …
A case for soft loss functions
Recently, Peterson et al. provided evidence of the benefits of using probabilistic soft labels
generated from crowd annotations for training a computer vision model, showing that using …
generated from crowd annotations for training a computer vision model, showing that using …