Towards trustworthy ai: A review of ethical and robust large language models

MM Ferdaus, M Abdelguerfi, E Ioup, KN Niles… - arxiv preprint arxiv …, 2024 - arxiv.org
The rapid progress in Large Language Models (LLMs) could transform many fields, but their
fast development creates significant challenges for oversight, ethical creation, and building …

Nbias: A natural language processing framework for BIAS identification in text

S Raza, M Garg, DJ Reji, SR Bashir, C Ding - Expert Systems with …, 2024 - Elsevier
Bias in textual data can lead to skewed interpretations and outcomes when the data is used.
These biases could perpetuate stereotypes, discrimination, or other forms of unfair …

Addressing bias in generative AI: Challenges and research opportunities in information management

X Wei, N Kumar, H Zhang - arxiv preprint arxiv:2502.10407, 2025 - arxiv.org
Generative AI technologies, particularly Large Language Models (LLMs), have transformed
information management systems but introduced substantial biases that can compromise …

ChatGPT based data augmentation for improved parameter-efficient debiasing of LLMs

P Han, R Kocielnik, A Saravanan, R Jiang… - arxiv preprint arxiv …, 2024 - arxiv.org
Large Language models (LLMs), while powerful, exhibit harmful social biases. Debiasing is
often challenging due to computational costs, data constraints, and potential degradation of …

[PDF][PDF] Metrics for what, metrics for whom: assessing actionability of bias evaluation metrics in NLP

P Delobelle, G Attanasio, D Nozza… - Proceedings of the …, 2024 - iris.unibocconi.it
This paper introduces the concept of actionability in the context of bias measures in natural
language processing (NLP). We define actionability as the degree to which a …

Fair Text Classification with Wasserstein Independence

T Leteno, A Gourru, C Laclau, R Emonet… - arxiv preprint arxiv …, 2023 - arxiv.org
Group fairness is a central research topic in text classification, where reaching fair treatment
between sensitive groups (eg women vs. men) remains an open challenge. This paper …

Gender Bias Mitigation for Bangla Classification Tasks

SKS Joy, AH Mahy, M Sultana, AM Abha… - arxiv preprint arxiv …, 2024 - arxiv.org
In this study, we investigate gender bias in Bangla pretrained language models, a largely
under explored area in low-resource languages. To assess this bias, we applied gender …

A Data-Centric Approach to Detecting and Mitigating Demographic Bias in Pediatric Mental Health Text: A Case Study in Anxiety Detection

J Ive, P Bondaronek, V Yadav, D Santel… - arxiv preprint arxiv …, 2024 - arxiv.org
Introduction: Healthcare AI models often inherit biases from their training data. While efforts
have primarily targeted bias in structured data, mental health heavily depends on …

REFINE-LM: Mitigating Language Model Stereotypes via Reinforcement Learning

R Qureshi, N Es-Sebbani, L Galárraga, Y Graham… - ECAI 2024, 2024 - ebooks.iospress.nl
With the introduction of (large) language models, there has been significant concern about
the unintended bias such models may inherit from their training data. A number of studies …

Breaking Bias: Alpha Weighted Loss in Multi-objective Learning Taming Gender Stereotypes

MN Amin, A Al Imran, FS Bayram, L Hübner… - World Conference on …, 2023 - Springer
Navigating the uncertainties of job classification and gender bias, this paper presents multi-
objective learning approach using BERT-based model that concurrently handles maximizing …