Towards trustworthy LLMs: a review on debiasing and dehallucinating in large language models

Z Lin, S Guan, W Zhang, H Zhang, Y Li… - Artificial Intelligence …, 2024‏ - Springer
Recently, large language models (LLMs) have attracted considerable attention due to their
remarkable capabilities. However, LLMs' generation of biased or hallucinatory content …

Language (technology) is power: A critical survey of" bias" in nlp

SL Blodgett, S Barocas, H Daumé III… - arxiv preprint arxiv …, 2020‏ - arxiv.org
We survey 146 papers analyzing" bias" in NLP systems, finding that their motivations are
often vague, inconsistent, and lacking in normative reasoning, despite the fact that …

Unlearning bias in language models by partitioning gradients

C Yu, S Jeoung, A Kasi, P Yu, H Ji - Findings of the Association for …, 2023‏ - aclanthology.org
Recent research has shown that large-scale pretrained language models, specifically
transformers, tend to exhibit issues relating to racism, sexism, religion bias, and toxicity in …

A survey on gender bias in natural language processing

K Stanczak, I Augenstein - arxiv preprint arxiv:2112.14168, 2021‏ - arxiv.org
Language can be used as a means of reproducing and enforcing harmful stereotypes and
biases and has been analysed as such in numerous research. In this paper, we present a …

RedditBias: A real-world resource for bias evaluation and debiasing of conversational language models

S Barikeri, A Lauscher, I Vulić, G Glavaš - arxiv preprint arxiv:2106.03521, 2021‏ - arxiv.org
Text representation models are prone to exhibit a range of societal biases, reflecting the non-
controlled and biased nature of the underlying pretraining data, which consequently leads to …

Sustainable modular debiasing of language models

A Lauscher, T Lueken, G Glavaš - arxiv preprint arxiv:2109.03646, 2021‏ - arxiv.org
Unfair stereotypical biases (eg, gender, racial, or religious biases) encoded in modern
pretrained language models (PLMs) have negative ethical implications for widespread …

A survey of race, racism, and anti-racism in NLP

A Field, SL Blodgett, Z Waseem, Y Tsvetkov - arxiv preprint arxiv …, 2021‏ - arxiv.org
Despite inextricable ties between race and language, little work has considered race in NLP
research and development. In this work, we survey 79 papers from the ACL anthology that …

Debiasing pre-trained contextualised embeddings

M Kaneko, D Bollegala - arxiv preprint arxiv:2101.09523, 2021‏ - arxiv.org
In comparison to the numerous debiasing methods proposed for the static non-
contextualised word embeddings, the discriminative biases in contextualised embeddings …

[PDF][PDF] Survey on sociodemographic bias in natural language processing

V Gupta, PN Venkit, S Wilson… - arxiv preprint arxiv …, 2023‏ - researchgate.net
Deep neural networks often learn unintended bias during training, which might have harmful
effects when deployed in realworld settings. This work surveys 214 papers related to …

Unmasking the mask–evaluating social biases in masked language models

M Kaneko, D Bollegala - Proceedings of the AAAI Conference on …, 2022‏ - ojs.aaai.org
Abstract Masked Language Models (MLMs) have shown superior performances in
numerous downstream Natural Language Processing (NLP) tasks. Unfortunately, MLMs …