Under the surface: Tracking the artifactuality of llm-generated data
This work delves into the expanding role of large language models (LLMs) in generating
artificial data. LLMs are increasingly employed to create a variety of outputs, including …
artificial data. LLMs are increasingly employed to create a variety of outputs, including …
Perspectivist approaches to natural language processing: a survey
Abstract In Artificial Intelligence research, perspectivism is an approach to machine learning
that aims at leveraging data annotated by different individuals in order to model varied …
that aims at leveraging data annotated by different individuals in order to model varied …
How (not) to use sociodemographic information for subjective nlp tasks
Annotators' sociodemographic backgrounds (ie, the individual compositions of their gender,
age, educational background, etc.) have a strong impact on their decisions when working on …
age, educational background, etc.) have a strong impact on their decisions when working on …
The ecological fallacy in annotation: Modelling human label variation goes beyond sociodemographics
Many NLP tasks exhibit human label variation, where different annotators give different
labels to the same texts. This variation is known to depend, at least in part, on the …
labels to the same texts. This variation is known to depend, at least in part, on the …
You are what you annotate: Towards better models through annotator representations
Annotator disagreement is ubiquitous in natural language processing (NLP) tasks. There are
multiple reasons for such disagreements, including the subjectivity of the task, difficult cases …
multiple reasons for such disagreements, including the subjectivity of the task, difficult cases …
''Fifty Shades of Bias'': Normative Ratings of Gender Bias in GPT Generated English Text
Language serves as a powerful tool for the manifestation of societal belief systems. In doing
so, it also perpetuates the prevalent biases in our society. Gender bias is one of the most …
so, it also perpetuates the prevalent biases in our society. Gender bias is one of the most …
Exploring Cross-Cultural Differences in English Hate Speech Annotations: From Dataset Construction to Analysis
Warning**: this paper contains content that may be offensive or upsetting.* Most hate speech
datasets neglect the cultural diversity within a single language, resulting in a critical …
datasets neglect the cultural diversity within a single language, resulting in a critical …
Quantifying the persona effect in llm simulations
Large language models (LLMs) have shown remarkable promise in simulating human
language use and behavior. In this study, we delve into the intersection of persona variables …
language use and behavior. In this study, we delve into the intersection of persona variables …
Crehate: Cross-cultural re-annotation of english hate speech dataset
English datasets predominantly reflect the perspectives of certain nationalities, which can
lead to cultural biases in models and datasets. This is particularly problematic in tasks …
lead to cultural biases in models and datasets. This is particularly problematic in tasks …
Akal Badi ya Bias: An Exploratory Study of Gender Bias in Hindi Language Technology
Existing research in measuring and mitigating gender bias predominantly centers on
English, overlooking the intricate challenges posed by non-English languages and the …
English, overlooking the intricate challenges posed by non-English languages and the …