Explaining deep neural networks and beyond: A review of methods and applications

W Samek, G Montavon, S Lapuschkin… - Proceedings of the …, 2021 - ieeexplore.ieee.org
With the broader and highly successful usage of machine learning (ML) in industry and the
sciences, there has been a growing demand for explainable artificial intelligence (XAI) …

Interpretable and explainable machine learning: a methods‐centric overview with concrete examples

R Marcinkevičs, JE Vogt - Wiley Interdisciplinary Reviews: Data …, 2023 - Wiley Online Library
Interpretability and explainability are crucial for machine learning (ML) and statistical
applications in medicine, economics, law, and natural sciences and form an essential …

Captum: A unified and generic model interpretability library for pytorch

N Kokhlikyan, V Miglani, M Martin, E Wang… - arxiv preprint arxiv …, 2020 - arxiv.org
In this paper we introduce a novel, unified, open-source model interpretability library for
PyTorch [12]. The library contains generic implementations of a number of gradient and …

Pruning neural networks without any data by iteratively conserving synaptic flow

H Tanaka, D Kunin, DL Yamins… - Advances in neural …, 2020 - proceedings.neurips.cc
Pruning the parameters of deep neural networks has generated intense interest due to
potential savings in time, memory and energy both during training and at test time. Recent …

Improving adversarial transferability via neuron attribution-based attacks

J Zhang, W Wu, J Huang, Y Huang… - Proceedings of the …, 2022 - openaccess.thecvf.com
Deep neural networks (DNNs) are known to be vulnerable to adversarial examples. It is thus
imperative to devise effective attack algorithms to identify the deficiencies of DNNs …

Interpretability and fairness evaluation of deep learning models on MIMIC-IV dataset

C Meng, L Trinh, N Xu, J Enouen, Y Liu - Scientific Reports, 2022 - nature.com
The recent release of large-scale healthcare datasets has greatly propelled the research of
data-driven deep learning models for healthcare applications. However, due to the nature of …

Finding neurons in a haystack: Case studies with sparse probing

W Gurnee, N Nanda, M Pauly, K Harvey… - arxiv preprint arxiv …, 2023 - arxiv.org
Despite rapid adoption and deployment of large language models (LLMs), the internal
computations of these models remain opaque and poorly understood. In this work, we seek …

Explaining in style: Training a gan to explain a classifier in stylespace

O Lang, Y Gandelsman, M Yarom… - Proceedings of the …, 2021 - openaccess.thecvf.com
Image classification models can depend on multiple different semantic attributes of the
image. An explanation of the decision of the classifier needs to both discover and visualize …

Interpretability and explainability: A machine learning zoo mini-tour

R Marcinkevičs, JE Vogt - arxiv preprint arxiv:2012.01805, 2020 - arxiv.org
In this review, we examine the problem of designing interpretable and explainable machine
learning models. Interpretability and explainability lie at the core of many machine learning …

Model fusion via optimal transport

SP Singh, M Jaggi - Advances in Neural Information …, 2020 - proceedings.neurips.cc
Combining different models is a widely used paradigm in machine learning applications.
While the most common approach is to form an ensemble of models and average their …