Safetyprompts: a systematic review of open datasets for evaluating and improving large language model safety

P Röttger, F Pernisi, B Vidgen, D Hovy - ar** millions of users write texts about diverse
issues, and in doing so expose users to different ideas and perspectives. This creates …

Revisiting Rogers' Paradox in the Context of Human-AI Interaction

KM Collins, U Bhatt, I Sucholutsky - arxiv preprint arxiv:2501.10476, 2025 - arxiv.org
Humans learn about the world, and how to act in the world, in many ways: from individually
conducting experiments to observing and reproducing others' behavior. Different learning …

Culture is Not Trivia: Sociocultural Theory for Cultural NLP

N Zhou, D Bamman, IL Bleaman - arxiv preprint arxiv:2502.12057, 2025 - arxiv.org
The field of cultural NLP has recently experienced rapid growth, driven by a pressing need
to ensure that language technologies are effective and safe across a pluralistic user base …

Correcting Annotator Bias in Training Data: Population-Aligned Instance Replication (PAIR)

S Eckman, B Ma, C Kern, R Chew, B Plank… - arxiv preprint arxiv …, 2025 - arxiv.org
Models trained on crowdsourced labels may not reflect broader population views when
annotator pools are not representative. Since collecting representative labels is challenging …