From anecdotal evidence to quantitative evaluation methods: A systematic review on evaluating explainable ai

M Nauta, J Trienes, S Pathak, E Nguyen… - ACM Computing …, 2023 - dl.acm.org
The rising popularity of explainable artificial intelligence (XAI) to understand high-performing
black boxes raised the question of how to evaluate explanations of machine learning (ML) …

Towards generalisable hate speech detection: a review on obstacles and solutions

W Yin, A Zubiaga - PeerJ Computer Science, 2021 - peerj.com
Hate speech is one type of harmful online content which directly attacks or promotes hate
towards a group or an individual member based on their actual or perceived aspects of …

Interpreting graph neural networks for NLP with differentiable edge masking

MS Schlichtkrull, N De Cao, I Titov - arxiv preprint arxiv:2010.00577, 2020 - arxiv.org
Graph neural networks (GNNs) have become a popular approach to integrating structural
inductive biases into NLP models. However, there has been little work on interpreting them …

Glue-x: Evaluating natural language understanding models from an out-of-distribution generalization perspective

L Yang, S Zhang, L Qin, Y Li, Y Wang, H Liu… - arxiv preprint arxiv …, 2022 - arxiv.org
Pre-trained language models (PLMs) are known to improve the generalization performance
of natural language understanding models by leveraging large amounts of data during the …

Self-attention attribution: Interpreting information interactions inside transformer

Y Hao, L Dong, F Wei, K Xu - Proceedings of the AAAI Conference on …, 2021 - ojs.aaai.org
The great success of Transformer-based models benefits from the powerful multi-head self-
attention mechanism, which learns token dependencies and encodes contextual information …

Handling bias in toxic speech detection: A survey

T Garg, S Masud, T Suresh, T Chakraborty - ACM Computing Surveys, 2023 - dl.acm.org
Detecting online toxicity has always been a challenge due to its inherent subjectivity. Factors
such as the context, geography, socio-political climate, and background of the producers …

Contextualizing hate speech classifiers with post-hoc explanation

B Kennedy, X **, AM Davani, M Dehghani… - arxiv preprint arxiv …, 2020 - arxiv.org
Hate speech classifiers trained on imbalanced datasets struggle to determine if group
identifiers like" gay" or" black" are used in offensive or prejudiced ways. Such biases …

Exposing the limits of zero-shot cross-lingual hate speech detection

D Nozza - Proceedings of the 59th Annual Meeting of the …, 2021 - aclanthology.org
Reducing and counter-acting hate speech on Social Media is a significant concern. Most of
the proposed automatic methods are conducted exclusively on English and very few …

BERT meets shapley: Extending SHAP explanations to transformer-based classifiers

E Kokalj, B Škrlj, N Lavrač, S Pollak… - Proceedings of the …, 2021 - aclanthology.org
Transformer-based neural networks offer very good classification performance across a
wide range of domains, but do not provide explanations of their predictions. While several …

Entropy-based attention regularization frees unintended bias mitigation from lists

G Attanasio, D Nozza, D Hovy, E Baralis - arxiv preprint arxiv:2203.09192, 2022 - arxiv.org
Natural Language Processing (NLP) models risk overfitting to specific terms in the training
data, thereby reducing their performance, fairness, and generalizability. Eg, neural hate …