Hate speech classifiers learn normative social stereotypes

AM Davani, M Atari, B Kennedy… - Transactions of the …, 2023 - direct.mit.edu
Social stereotypes negatively impact individuals' judgments about different groups and may
have a critical role in understanding language directed toward marginalized groups. Here …

Annotators with attitudes: How annotator beliefs and identities bias toxic language detection

M Sap, S Swayamdipta, L Vianna, X Zhou… - arxiv preprint arxiv …, 2021 - arxiv.org
The perceived toxicity of language can vary based on someone's identity and beliefs, but
this variation is often ignored when collecting toxic language datasets, resulting in dataset …

Quality aspects of annotated data: A research synthesis

J Beck - AStA Wirtschafts-und Sozialstatistisches Archiv, 2023 - Springer
Abstract The quality of Machine Learning (ML) applications is commonly assessed by
quantifying how well an algorithm fits its respective training data. Yet, a perfect model that …

Detectors for safe and reliable llms: Implementations, uses, and limitations

S Achintalwar, AA Garcia, A Anaby-Tavor… - arxiv preprint arxiv …, 2024 - arxiv.org
Large language models (LLMs) are susceptible to a variety of risks, from non-faithful output
to biased and toxic generations. Due to several limiting factors surrounding LLMs (training …

Annotation sensitivity: Training data collection methods affect model performance

C Kern, S Eckman, J Beck, R Chew, B Ma… - arxiv preprint arxiv …, 2023 - arxiv.org
When training data are collected from human annotators, the design of the annotation
instrument, the instructions given to annotators, the characteristics of the annotators, and …

GRASP: a disagreement analysis framework to assess group associations in perspectives

V Prabhakaran, C Homan, L Aroyo, AM Davani… - arxiv preprint arxiv …, 2023 - arxiv.org
Human annotation plays a core role in machine learning--annotations for supervised
models, safety guardrails for generative models, and human feedback for reinforcement …

Critical perspectives: A benchmark revealing pitfalls in PerspectiveAPI

L Rosenblatt, L Piedras, J Wilkins - Proceedings of the Second …, 2022 - aclanthology.org
Detecting “toxic” language in internet content is a pressing social and technical challenge. In
this work, we focus on Perspective API from Jigsaw, a state-of-the-art tool that promises to …

SoUnD Framework: Analyzing (So) cial Representation in (Un) structured (D) ata

M Díaz, S Dev, E Reif, E Denton… - Proceedings of the AAAI …, 2024 - ojs.aaai.org
Decisions about how to responsibly collect, use and document data often rely upon
understanding how people are represented in data. Yet, the unlabeled nature and scale of …

The risks of machine learning systems

S Tan, A Taeihagh, K Baxter - arxiv preprint arxiv:2204.09852, 2022 - arxiv.org
The speed and scale at which machine learning (ML) systems are deployed are
accelerating even as an increasing number of studies highlight their potential for negative …