Under the surface: Tracking the artifactuality of llm-generated data

D Das, K De Langis, A Martin-Boyle, J Kim… - arxiv preprint arxiv …, 2024 - arxiv.org
This work delves into the expanding role of large language models (LLMs) in generating
artificial data. LLMs are increasingly employed to create a variety of outputs, including …

Perspectivist approaches to natural language processing: a survey

S Frenda, G Abercrombie, V Basile, A Pedrani… - Language Resources …, 2024 - Springer
Abstract In Artificial Intelligence research, perspectivism is an approach to machine learning
that aims at leveraging data annotated by different individuals in order to model varied …

How (not) to use sociodemographic information for subjective nlp tasks

T Beck, H Schuff, A Lauscher, I Gurevych - arxiv preprint arxiv:2309.07034, 2023 - arxiv.org
Annotators' sociodemographic backgrounds (ie, the individual compositions of their gender,
age, educational background, etc.) have a strong impact on their decisions when working on …

The ecological fallacy in annotation: Modelling human label variation goes beyond sociodemographics

M Orlikowski, P Röttger, P Cimiano, D Hovy - arxiv preprint arxiv …, 2023 - arxiv.org
Many NLP tasks exhibit human label variation, where different annotators give different
labels to the same texts. This variation is known to depend, at least in part, on the …

You are what you annotate: Towards better models through annotator representations

N Deng, XF Zhang, S Liu, W Wu, L Wang… - arxiv preprint arxiv …, 2023 - arxiv.org
Annotator disagreement is ubiquitous in natural language processing (NLP) tasks. There are
multiple reasons for such disagreements, including the subjectivity of the task, difficult cases …

''Fifty Shades of Bias'': Normative Ratings of Gender Bias in GPT Generated English Text

R Hada, A Seth, H Diddee, K Bali - arxiv preprint arxiv:2310.17428, 2023 - arxiv.org
Language serves as a powerful tool for the manifestation of societal belief systems. In doing
so, it also perpetuates the prevalent biases in our society. Gender bias is one of the most …

Exploring Cross-Cultural Differences in English Hate Speech Annotations: From Dataset Construction to Analysis

N Lee, C Jung, J Myung, J **… - Proceedings of the …, 2024 - aclanthology.org
Warning**: this paper contains content that may be offensive or upsetting.* Most hate speech
datasets neglect the cultural diversity within a single language, resulting in a critical …

Quantifying the persona effect in llm simulations

T Hu, N Collier - arxiv preprint arxiv:2402.10811, 2024 - arxiv.org
Large language models (LLMs) have shown remarkable promise in simulating human
language use and behavior. In this study, we delve into the intersection of persona variables …

Crehate: Cross-cultural re-annotation of english hate speech dataset

N Lee, C Jung, J Myung, J **, J Kim, A Oh - arxiv preprint arxiv …, 2023 - arxiv.org
English datasets predominantly reflect the perspectives of certain nationalities, which can
lead to cultural biases in models and datasets. This is particularly problematic in tasks …

Akal Badi ya Bias: An Exploratory Study of Gender Bias in Hindi Language Technology

R Hada, S Husain, V Gumma, H Diddee… - The 2024 ACM …, 2024 - dl.acm.org
Existing research in measuring and mitigating gender bias predominantly centers on
English, overlooking the intricate challenges posed by non-English languages and the …