Recent advances in hate speech moderation: Multimodality and the role of large models
In the evolving landscape of online communication, moderating hate speech (HS) presents
an intricate challenge, compounded by the multimodal nature of digital content. This …
an intricate challenge, compounded by the multimodal nature of digital content. This …
Social media hate speech detection using explainable artificial intelligence (XAI)
Explainable artificial intelligence (XAI) characteristics have flexible and multifaceted
potential in hate speech detection by deep learning models. Interpreting and explaining …
potential in hate speech detection by deep learning models. Interpreting and explaining …
Challenges in applying explainability methods to improve the fairness of NLP models
Motivations for methods in explainable artificial intelligence (XAI) often include detecting,
quantifying and mitigating bias, and contributing to making machine learning models fairer …
quantifying and mitigating bias, and contributing to making machine learning models fairer …
Explainable abuse detection as intent classification and slot filling
To proactively offer social media users a safe online experience, there is a need for systems
that can detect harmful posts and promptly alert platform moderators. In order to guarantee …
that can detect harmful posts and promptly alert platform moderators. In order to guarantee …
Robustness of models addressing Information Disorder: A comprehensive review and benchmarking study
Abstract Machine learning and deep learning models are increasingly susceptible to
adversarial attacks, particularly in critical areas like cybersecurity and Information Disorder …
adversarial attacks, particularly in critical areas like cybersecurity and Information Disorder …
[HTML][HTML] Systematic keyword and bias analyses in hate speech detection
Hate speech detection refers broadly to the automatic identification of language that may be
considered discriminatory against certain groups of people. The goal is to help online …
considered discriminatory against certain groups of people. The goal is to help online …
InterroLang: Exploring NLP models and datasets through dialogue-based explanations
While recently developed NLP explainability methods let us open the black box in various
ways (Madsen et al., 2022), a missing ingredient in this endeavor is an interactive tool …
ways (Madsen et al., 2022), a missing ingredient in this endeavor is an interactive tool …
Towards trustworthy explanation: On causal rationalization
With recent advances in natural language processing, rationalization becomes an essential
self-explaining diagram to disentangle the black box by selecting a subset of input texts to …
self-explaining diagram to disentangle the black box by selecting a subset of input texts to …
DDImage: an image reduction based approach for automatically explaining black-box classifiers
Due to the prevalent application of machine learning (ML) techniques and the intrinsic black-
box nature of ML models, the need for good explanations that are sufficient and necessary …
box nature of ML models, the need for good explanations that are sufficient and necessary …
Explaining Finetuned Transformers on Hate Speech Predictions Using Layerwise Relevance Propagation
Explainability of model predictions has become imperative for architectures that involve fine-
tuning of a pretrained transformer encoder for a downstream task such as hate speech …
tuning of a pretrained transformer encoder for a downstream task such as hate speech …