The'Problem'of Human Label Variation: On Ground Truth in Data, Modeling and Evaluation

B Plank - arxiv preprint arxiv:2211.02570, 2022‏ - arxiv.org
Human variation in labeling is often considered noise. Annotation projects for machine
learning (ML) aim at minimizing human label variation, with the assumption to maximize …

Learning from disagreement: A survey

AN Uma, T Fornaciari, D Hovy, S Paun, B Plank… - Journal of Artificial …, 2021‏ - jair.org
Abstract Many tasks in Natural Language Processing (NLP) and Computer Vision (CV) offer
evidence that humans disagree, from objective tasks such as part-of-speech tagging to more …

Inherent disagreements in human textual inferences

E Pavlick, T Kwiatkowski - Transactions of the Association for …, 2019‏ - direct.mit.edu
We analyze human's disagreements about the validity of natural language inferences. We
show that, very often, disagreements are not dismissible as annotation “noise”, but rather …

Why don't you do it right? analysing annotators' disagreement in subjective tasks

M Sandri, E Leonardelli, S Tonelli… - Proceedings of the 17th …, 2023‏ - aclanthology.org
Annotators' disagreement in linguistic data has been recently the focus of multiple initiatives
aimed at raising awareness on issues related to 'majority voting'when aggregating diverging …

Agreeing to disagree: Annotating offensive language datasets with annotators' disagreement

E Leonardelli, S Menini, AP Aprosio, M Guerini… - arxiv preprint arxiv …, 2021‏ - arxiv.org
Since state-of-the-art approaches to offensive language detection rely on supervised
learning, it is crucial to quickly adapt them to the continuously evolving scenario of social …

What can we learn from collective human opinions on natural language inference data?

Y Nie, X Zhou, M Bansal - arxiv preprint arxiv:2010.03532, 2020‏ - arxiv.org
Despite the subjective nature of many NLP tasks, most NLU evaluations have focused on
using the majority label with presumably high agreement as the ground truth. Less attention …

Consensus and subjectivity of skin tone annotation for ML fairness

C Schumann, F Olanubi, A Wright… - Advances in …, 2023‏ - proceedings.neurips.cc
Understanding different human attributes and how they affect model behavior may become
a standard need for all model creation and usage, from traditional computer vision tasks to …

Everyone's voice matters: Quantifying annotation disagreement using demographic information

R Wan, J Kim, D Kang - Proceedings of the AAAI Conference on …, 2023‏ - ojs.aaai.org
In NLP annotation, it is common to have multiple annotators label the text and then obtain
the ground truth labels based on major annotators' agreement. However, annotators are …

Conformalized credal set predictors

A Javanmardi, D Stutz… - Advances in Neural …, 2025‏ - proceedings.neurips.cc
Credal sets are sets of probability distributions that are considered as candidates for an
imprecisely known ground-truth distribution. In machine learning, they have recently …

[PDF][PDF] Learning part-of-speech taggers with inter-annotator agreement loss

B Plank, D Hovy, A Sogaard - Proceedings of EACL, 2014‏ - iris.unibocconi.it
In natural language processing (NLP) annotation projects, we use inter-annotator
agreement measures and annotation guidelines to ensure consistent annotations. However …