A survey on fairness in large language models

Y Li, M Du, R Song, X Wang, Y Wang - arxiv preprint arxiv:2308.10149, 2023 - arxiv.org
Large language models (LLMs) have shown powerful performance and development
prospect and are widely deployed in the real world. However, LLMs can capture social …

Handling bias in toxic speech detection: A survey

T Garg, S Masud, T Suresh, T Chakraborty - ACM Computing Surveys, 2023 - dl.acm.org
Detecting online toxicity has always been a challenge due to its inherent subjectivity. Factors
such as the context, geography, socio-political climate, and background of the producers …

Toxigen: A large-scale machine-generated dataset for adversarial and implicit hate speech detection

T Hartvigsen, S Gabriel, H Palangi, M Sap… - arxiv preprint arxiv …, 2022 - arxiv.org
Toxic language detection systems often falsely flag text that contains minority group
mentions as toxic, as those groups are often the targets of online hate. Such over-reliance …

On the opportunities and risks of foundation models

R Bommasani, DA Hudson, E Adeli, R Altman… - arxiv preprint arxiv …, 2021 - arxiv.org
AI is undergoing a paradigm shift with the rise of models (eg, BERT, DALL-E, GPT-3) that are
trained on broad data at scale and are adaptable to a wide range of downstream tasks. We …

Dealing with disagreements: Looking beyond the majority vote in subjective annotations

AM Davani, M Díaz, V Prabhakaran - Transactions of the Association …, 2022 - direct.mit.edu
Majority voting and averaging are common approaches used to resolve annotator
disagreements and derive single ground truth labels from multiple annotations. However …

Challenges in detoxifying language models

J Welbl, A Glaese, J Uesato, S Dathathri… - arxiv preprint arxiv …, 2021 - arxiv.org
Large language models (LM) generate remarkably fluent text and can be efficiently adapted
across NLP tasks. Measuring and guaranteeing the quality of generated text in terms of …

Prosocialdialog: A prosocial backbone for conversational agents

H Kim, Y Yu, L Jiang, X Lu, D Khashabi, G Kim… - arxiv preprint arxiv …, 2022 - arxiv.org
Most existing dialogue systems fail to respond properly to potentially unsafe user utterances
by either ignoring or passively agreeing with them. To address this issue, we introduce …

MOMENTA: A multimodal framework for detecting harmful memes and their targets

S Pramanick, S Sharma, D Dimitrov, MS Akhtar… - arxiv preprint arxiv …, 2021 - arxiv.org
Internet memes have become powerful means to transmit political, psychological, and socio-
cultural ideas. Although memes are typically humorous, recent days have witnessed an …

Having beer after prayer? measuring cultural bias in large language models

T Naous, MJ Ryan, A Ritter, W Xu - arxiv preprint arxiv:2305.14456, 2023 - arxiv.org
As the reach of large language models (LMs) expands globally, their ability to cater to
diverse cultural contexts becomes crucial. Despite advancements in multilingual …

[PDF][PDF] SafetyKit: First aid for measuring safety in open-domain conversational systems

E Dinan, G Abercrombie, SA Bergman… - Proceedings of the …, 2022 - iris.unibocconi.it
The social impact of natural language processing and its applications has received
increasing attention. In this position paper, we focus on the problem of safety for end-to-end …