[HTML][HTML] Methods for interpreting and understanding deep neural networks

G Montavon, W Samek, KR Müller - Digital signal processing, 2018 - Elsevier
This paper provides an entry point to the problem of interpreting a deep neural network
model and explaining its predictions. It is based on a tutorial given at ICASSP 2017. As a …

Post-hoc interpretability for neural nlp: A survey

A Madsen, S Reddy, S Chandar - ACM Computing Surveys, 2022 - dl.acm.org
Neural networks for NLP are becoming increasingly complex and widespread, and there is a
growing concern if these models are responsible to use. Explaining models helps to address …

On the opportunities and risks of foundation models

R Bommasani, DA Hudson, E Adeli, R Altman… - arxiv preprint arxiv …, 2021 - arxiv.org
AI is undergoing a paradigm shift with the rise of models (eg, BERT, DALL-E, GPT-3) that are
trained on broad data at scale and are adaptable to a wide range of downstream tasks. We …

Explainability for large language models: A survey

H Zhao, H Chen, F Yang, N Liu, H Deng, H Cai… - ACM Transactions on …, 2024 - dl.acm.org
Large language models (LLMs) have demonstrated impressive capabilities in natural
language processing. However, their internal mechanisms are still unclear and this lack of …

Unmasking Clever Hans predictors and assessing what machines really learn

S Lapuschkin, S Wäldchen, A Binder… - Nature …, 2019 - nature.com
Current learning machines have successfully solved hard application problems, reaching
high accuracy and displaying seemingly intelligent behavior. Here we apply recent …

Explainable artificial intelligence: A survey

FK Došilović, M Brčić, N Hlupić - 2018 41st International …, 2018 - ieeexplore.ieee.org
In the last decade, with availability of large datasets and more computing power, machine
learning systems have achieved (super) human performance in a wide variety of tasks …

A survey of the state of explainable AI for natural language processing

M Danilevsky, K Qian, R Aharonov, Y Katsis… - arxiv preprint arxiv …, 2020 - arxiv.org
Recent years have seen important advances in the quality of state-of-the-art models, but this
has come at the expense of models becoming less interpretable. This survey presents an …

Is attention interpretable?

S Serrano, NA Smith - arxiv preprint arxiv:1906.03731, 2019 - arxiv.org
Attention mechanisms have recently boosted performance on a range of NLP tasks.
Because attention layers explicitly weight input components' representations, it is also often …

Generating natural language adversarial examples through probability weighted word saliency

S Ren, Y Deng, K He, W Che - … of the 57th annual meeting of the …, 2019 - aclanthology.org
We address the problem of adversarial attacks on text classification, which is rarely studied
comparing to attacks on image classification. The challenge of this task is to generate …

[PDF][PDF] Linguistic Knowledge and Transferability of Contextual Representations

NF Liu - arxiv preprint arxiv:1903.08855, 2019 - fq.pkwyx.com
Contextual word representations derived from large-scale neural language models are
successful across a diverse set of NLP tasks, suggesting that they encode useful and …