Gradient based feature attribution in explainable ai: A technical review

Y Wang, T Zhang, X Guo, Z Shen - arxiv preprint arxiv:2403.10415, 2024 - arxiv.org
The surge in black-box AI models has prompted the need to explain the internal mechanism
and justify their reliability, especially in high-stakes applications, such as healthcare and …

Benchmarking and survey of explanation methods for black box models

F Bodria, F Giannotti, R Guidotti, F Naretto… - Data Mining and …, 2023 - Springer
The rise of sophisticated black-box machine learning models in Artificial Intelligence
systems has prompted the need for explanation methods that reveal how these models work …

Attcat: Explaining transformers via attentive class activation tokens

Y Qiang, D Pan, C Li, X Li, R Jang… - Advances in neural …, 2022 - proceedings.neurips.cc
Transformers have improved the state-of-the-art in various natural language processing and
computer vision tasks. However, the success of the Transformer model has not yet been duly …

IDGI: A framework to eliminate explanation noise from integrated gradients

R Yang, B Wang, M Bilgic - … of the IEEE/CVF Conference on …, 2023 - openaccess.thecvf.com
Integrated Gradients (IG) as well as its variants are well-known techniques for interpreting
the decisions of deep neural networks. While IG-based approaches attain state-of-the-art …

Fast axiomatic attribution for neural networks

R Hesse, S Schaub-Meyer… - Advances in Neural …, 2021 - proceedings.neurips.cc
Mitigating the dependence on spurious correlations present in the training dataset is a
quickly emerging and important topic of deep learning. Recent approaches include priors on …

MFABA: a more faithful and accelerated boundary-based attribution method for deep neural networks

Z Zhu, H Chen, J Zhang, X Wang, Z **, M Xue… - Proceedings of the …, 2024 - ojs.aaai.org
To better understand the output of deep neural networks (DNN), attribution based methods
have been an important approach for model interpretability, which assign a score for each …

Local path integration for attribution

P Yang, N Akhtar, Z Wen, A Mian - … of the AAAI Conference on Artificial …, 2023 - ojs.aaai.org
Path attribution methods are a popular tool to interpret a visual model's prediction on an
input. They integrate model gradients for the input features over a path defined between the …

Interpretability-aware vision transformer

Y Qiang, C Li, P Khanduri, D Zhu - arxiv preprint arxiv:2309.08035, 2023 - arxiv.org
Vision Transformers (ViTs) have become prominent models for solving various vision tasks.
However, the interpretability of ViTs has not kept pace with their promising performance …

Improving adversarial transferability via frequency-based stationary point search

Z Zhu, H Chen, J Zhang, X Wang, Z **, Q Lu… - Proceedings of the …, 2023 - dl.acm.org
Deep neural networks (DNNs) have been shown vulnerable to interference from adversarial
samples, leading to erroneous predictions. Investigating adversarial attacks can effectively …

Towards credible visual model interpretation with path attribution

N Akhtar, MAAK Jalwana - International Conference on …, 2023 - proceedings.mlr.press
With its inspirational roots in game-theory, path attribution framework stands out among the
post-hoc model interpretation techniques due to its axiomatic nature. However, recent …