Recent advances in hate speech moderation: Multimodality and the role of large models

MS Hee, S Sharma, R Cao, P Nandi… - arxiv preprint arxiv …, 2024 - arxiv.org
In the evolving landscape of online communication, moderating hate speech (HS) presents
an intricate challenge, compounded by the multimodal nature of digital content. This …

Social media hate speech detection using explainable artificial intelligence (XAI)

H Mehta, K Passi - Algorithms, 2022 - mdpi.com
Explainable artificial intelligence (XAI) characteristics have flexible and multifaceted
potential in hate speech detection by deep learning models. Interpreting and explaining …

Challenges in applying explainability methods to improve the fairness of NLP models

E Balkir, S Kiritchenko, I Nejadgholi… - arxiv preprint arxiv …, 2022 - arxiv.org
Motivations for methods in explainable artificial intelligence (XAI) often include detecting,
quantifying and mitigating bias, and contributing to making machine learning models fairer …

Explainable abuse detection as intent classification and slot filling

A Calabrese, B Ross, M Lapata - Transactions of the Association for …, 2022 - direct.mit.edu
To proactively offer social media users a safe online experience, there is a need for systems
that can detect harmful posts and promptly alert platform moderators. In order to guarantee …

Robustness of models addressing Information Disorder: A comprehensive review and benchmarking study

G Fenza, V Loia, C Stanzione, M Di Gisi - Neurocomputing, 2024 - Elsevier
Abstract Machine learning and deep learning models are increasingly susceptible to
adversarial attacks, particularly in critical areas like cybersecurity and Information Disorder …

[HTML][HTML] Systematic keyword and bias analyses in hate speech detection

GL De la Peña Sarracén, P Rosso - Information Processing & Management, 2023 - Elsevier
Hate speech detection refers broadly to the automatic identification of language that may be
considered discriminatory against certain groups of people. The goal is to help online …

InterroLang: Exploring NLP models and datasets through dialogue-based explanations

N Feldhus, Q Wang, T Anikina, S Chopra… - arxiv preprint arxiv …, 2023 - arxiv.org
While recently developed NLP explainability methods let us open the black box in various
ways (Madsen et al., 2022), a missing ingredient in this endeavor is an interactive tool …

Towards trustworthy explanation: On causal rationalization

W Zhang, T Wu, Y Wang, Y Cai… - … Conference on Machine …, 2023 - proceedings.mlr.press
With recent advances in natural language processing, rationalization becomes an essential
self-explaining diagram to disentangle the black box by selecting a subset of input texts to …

DDImage: an image reduction based approach for automatically explaining black-box classifiers

M Jiang, C Tang, XY Zhang, Y Zhao, Z Ding - Empirical Software …, 2024 - Springer
Due to the prevalent application of machine learning (ML) techniques and the intrinsic black-
box nature of ML models, the need for good explanations that are sufficient and necessary …

Explaining Finetuned Transformers on Hate Speech Predictions Using Layerwise Relevance Propagation

R Mishra, A Yadav, RR Shah, P Kumaraguru - … Conference on Big Data …, 2023 - Springer
Explainability of model predictions has become imperative for architectures that involve fine-
tuning of a pretrained transformer encoder for a downstream task such as hate speech …