Language generation models can cause harm: So what can we do about it? an actionable survey

S Kumar, V Balachandran, L Njoo… - arxiv preprint arxiv …, 2022 - arxiv.org
Recent advances in the capacity of large language models to generate human-like text have
resulted in their increased adoption in user-facing settings. In parallel, these improvements …

SOLD: Sinhala offensive language dataset

T Ranasinghe, I Anuradha, D Premasiri, K Silva… - Language Resources …, 2024 - Springer
The widespread of offensive content online, such as hate speech and cyber-bullying, is a
global phenomenon. This has sparked interest in the artificial intelligence (AI) and natural …

Offensive language identification in low-resourced code-mixed dravidian languages using pseudo-labeling

A Hande, K Puranik, K Yasaswini… - arxiv preprint arxiv …, 2021 - arxiv.org
Social media has effectively become the prime hub of communication and digital marketing.
As these platforms enable the free manifestation of thoughts and facts in text, images and …

Multi-task learning for toxic comment classification and rationale extraction

KB Nelatoori, HB Kommanti - Journal of Intelligent Information Systems, 2023 - Springer
Social media content moderation is the standard practice as on today to promote healthy
discussion forums. Toxic span prediction is helpful for explaining the toxic comment …

Towards building a robust toxicity predictor

D Bespalov, S Bhabesh, Y **ang, L Zhou… - arxiv preprint arxiv …, 2024 - arxiv.org
Recent NLP literature pays little attention to the robustness of toxicity language predictors,
while these systems are most likely to be used in adversarial contexts. This paper presents a …

Toxic comment classification and rationale extraction in code-mixed text leveraging co-attentive multi-task learning

KB Nelatoori, HB Kommanti - Language Resources and Evaluation, 2024 - Springer
Detecting toxic comments and rationale for the offensiveness of a social media post
promotes moderation of social media content. For this purpose, we propose a Co-Attentive …

The unappreciated role of intent in algorithmic moderation of social media content

X Wang, S Koneru, PN Venkit, B Frischmann… - arxiv preprint arxiv …, 2024 - arxiv.org
As social media has become a predominant mode of communication globally, the rise of
abusive content threatens to undermine civil discourse. Recognizing the critical nature of …

Hierarchical Adversarial Correction to Mitigate Identity Term Bias in Toxicity Detection

J Schäfer, U Heid, R Klinger - Proceedings of the 14th Workshop …, 2024 - aclanthology.org
Corpora that are the fundament for toxicity detection contain such expressions typically
directed against a target individual or group, eg, people of a specific gender or ethnicity …

TAR on social media: A framework for online content moderation

E Yang, DD Lewis, O Frieder - arxiv preprint arxiv:2108.12752, 2021 - arxiv.org
Content moderation (removing or limiting the distribution of posts based on their contents) is
one tool social networks use to fight problems such as harassment and disinformation …

A Novel Interpretability Metric for Explaining Bias in Language Models: Applications on Multilingual Models from Southeast Asia

LCL Gamboa, M Lee - arxiv preprint arxiv:2410.15464, 2024 - arxiv.org
Work on bias in pretrained language models (PLMs) focuses on bias evaluation and
mitigation and fails to tackle the question of bias attribution and explainability. We propose a …