We're afraid language models aren't modeling ambiguity

A Liu, Z Wu, J Michael, A Suhr, P West, A Koller… - arxiv preprint arxiv …, 2023 - arxiv.org
Ambiguity is an intrinsic feature of natural language. Managing ambiguity is a key part of
human language understanding, allowing us to anticipate misunderstanding as …

Investigating reasons for disagreement in natural language inference

NJ Jiang, MC Marneffe - Transactions of the Association for …, 2022 - direct.mit.edu
We investigate how disagreement in natural language inference (NLI) annotation arises. We
developed a taxonomy of disagreement sources with 10 categories spanning 3 high-level …

Stop measuring calibration when humans disagree

J Baan, W Aziz, B Plank, R Fernandez - arxiv preprint arxiv:2210.16133, 2022 - arxiv.org
Calibration is a popular framework to evaluate whether a classifier knows when it does not
know-ie, its predictive probabilities are a good indication of how likely a prediction is to be …

SemEval-2024 Task 6: SHROOM, a shared-task on hallucinations and related observable overgeneration mistakes

T Mickus, E Zosa, R Vázquez, T Vahtola… - Proceedings of the …, 2024 - aclanthology.org
This paper presents the results of the SHROOM, a shared task focused on detecting
hallucinations: outputs from natural language generation (NLG) systems that are fluent, yet …

How (not) to use sociodemographic information for subjective nlp tasks

T Beck, H Schuff, A Lauscher, I Gurevych - arxiv preprint arxiv:2309.07034, 2023 - arxiv.org
Annotators' sociodemographic backgrounds (ie, the individual compositions of their gender,
age, educational background, etc.) have a strong impact on their decisions when working on …

Learning with different amounts of annotation: From zero to many labels

S Zhang, C Gong, E Choi - arxiv preprint arxiv:2109.04408, 2021 - arxiv.org
Training NLP systems typically assumes access to annotated data that has a single human
label per example. Given imperfect labeling from annotators and inherent ambiguity of …

You are what you annotate: Towards better models through annotator representations

N Deng, XF Zhang, S Liu, W Wu, L Wang… - arxiv preprint arxiv …, 2023 - arxiv.org
Annotator disagreement is ubiquitous in natural language processing (NLP) tasks. There are
multiple reasons for such disagreements, including the subjectivity of the task, difficult cases …

Fundamental Problems With Model Editing: How Should Rational Belief Revision Work in LLMs?

P Hase, T Hofweber, X Zhou, E Stengel-Eskin… - arxiv preprint arxiv …, 2024 - arxiv.org
The model editing problem concerns how language models should learn new facts about
the world over time. While empirical research on model editing has drawn widespread …

Augmenting industrial chatbots in energy systems using chatgpt generative ai

G Gamage, S Kahawala, N Mills… - 2023 IEEE 32nd …, 2023 - ieeexplore.ieee.org
Chatbots, the automation of communicative labor, have been widely deployed in industrial
applications and systems. Built upon the Generative Pre-trained Transformer 3 (GPT-3) …

" Seeing the Big through the Small": Can LLMs Approximate Human Judgment Distributions on NLI from a Few Explanations?

B Chen, X Wang, S Peng, R Litschko… - arxiv preprint arxiv …, 2024 - arxiv.org
Human label variation (HLV) is a valuable source of information that arises when multiple
human annotators provide different labels for valid reasons. In Natural Language Inference …