Handling bias in toxic speech detection: A survey

T Garg, S Masud, T Suresh, T Chakraborty - ACM Computing Surveys, 2023 - dl.acm.org
Detecting online toxicity has always been a challenge due to its inherent subjectivity. Factors
such as the context, geography, socio-political climate, and background of the producers …

The'Problem'of Human Label Variation: On Ground Truth in Data, Modeling and Evaluation

B Plank - arxiv preprint arxiv:2211.02570, 2022 - arxiv.org
Human variation in labeling is often considered noise. Annotation projects for machine
learning (ML) aim at minimizing human label variation, with the assumption to maximize …

Lamp: When large language models meet personalization

A Salemi, S Mysore, M Bendersky, H Zamani - arxiv preprint arxiv …, 2023 - arxiv.org
This paper highlights the importance of personalization in large language models and
introduces the LaMP benchmark--a novel benchmark for training and evaluating language …

Personalisation within bounds: A risk taxonomy and policy framework for the alignment of large language models with personalised feedback

HR Kirk, B Vidgen, P Röttger, SA Hale - arxiv preprint arxiv:2303.05453, 2023 - arxiv.org
Large language models (LLMs) are used to generate content for a wide range of tasks, and
are set to reach a growing audience in coming years due to integration in product interfaces …

Dices dataset: Diversity in conversational ai evaluation for safety

L Aroyo, A Taylor, M Diaz, C Homan… - Advances in …, 2023 - proceedings.neurips.cc
Abstract Machine learning approaches often require training and evaluation datasets with a
clear separation between positive and negative examples. This requirement overly …

Hate speech classifiers learn normative social stereotypes

AM Davani, M Atari, B Kennedy… - Transactions of the …, 2023 - direct.mit.edu
Social stereotypes negatively impact individuals' judgments about different groups and may
have a critical role in understanding language directed toward marginalized groups. Here …

Why don't you do it right? analysing annotators' disagreement in subjective tasks

M Sandri, E Leonardelli, S Tonelli… - Proceedings of the 17th …, 2023 - aclanthology.org
Annotators' disagreement in linguistic data has been recently the focus of multiple initiatives
aimed at raising awareness on issues related to 'majority voting'when aggregating diverging …

[HTML][HTML] Human-centered neural reasoning for subjective content processing: Hate speech, emotions, and humor

P Kazienko, J Bielaniewicz, M Gruza, K Kanclerz… - Information …, 2023 - Elsevier
Some tasks in content processing, eg, natural language processing (NLP), like hate or
offensive speech and emotional or funny text detection, are subjective by nature. Each …

STELA: a community-centred approach to norm elicitation for AI alignment

S Bergman, N Marchal, J Mellor, S Mohamed… - Scientific Reports, 2024 - nature.com
Value alignment, the process of ensuring that artificial intelligence (AI) systems are aligned
with human values and goals, is a critical issue in AI research. Existing scholarship has …

Simplesafetytests: a test suite for identifying critical safety risks in large language models

B Vidgen, N Scherrer, HR Kirk, R Qian… - arxiv preprint arxiv …, 2023 - arxiv.org
The past year has seen rapid acceleration in the development of large language models
(LLMs). However, without proper steering and safeguards, LLMs will readily follow malicious …