Visual attention methods in deep learning: An in-depth survey

M Hassanin, S Anwar, I Radwan, FS Khan, A Mian - Information Fusion, 2024 - Elsevier
Inspired by the human cognitive system, attention is a mechanism that imitates the human
cognitive awareness about specific information, amplifying critical details to focus more on …

Attention, please! A survey of neural attention models in deep learning

A de Santana Correia, EL Colombini - Artificial Intelligence Review, 2022 - Springer
In humans, Attention is a core property of all perceptual and cognitive operations. Given our
limited ability to process competing sources, attention mechanisms select, modulate, and …

Incorporating bert into neural machine translation

J Zhu, Y **a, L Wu, D He, T Qin, W Zhou, H Li… - arxiv preprint arxiv …, 2020 - arxiv.org
The recently proposed BERT has shown great power on a variety of natural language
understanding tasks, such as text classification, reading comprehension, etc. However, how …

Variational attention-based interpretable transformer network for rotary machine fault diagnosis

Y Li, Z Zhou, C Sun, X Chen… - IEEE transactions on …, 2022 - ieeexplore.ieee.org
Deep learning technology provides a promising approach for rotary machine fault diagnosis
(RMFD), where vibration signals are commonly utilized as input of a deep network model to …

The elephant in the interpretability room: Why use attention as explanation when we have saliency methods?

J Bastings, K Filippova - arxiv preprint arxiv:2010.05607, 2020 - arxiv.org
There is a recent surge of interest in using attention as explanation of model predictions,
with mixed evidence on whether attention can be used as such. While attention conveniently …

Gmnn: Graph markov neural networks

M Qu, Y Bengio, J Tang - International conference on …, 2019 - proceedings.mlr.press
This paper studies semi-supervised object classification in relational data, which is a
fundamental problem in relational data modeling. The problem has been extensively studied …

Fixup initialization: Residual learning without normalization

H Zhang, YN Dauphin, T Ma - arxiv preprint arxiv:1901.09321, 2019 - arxiv.org
Normalization layers are a staple in state-of-the-art deep neural network architectures. They
are widely believed to stabilize training, enable higher learning rate, accelerate …

Adaptively sparse transformers

GM Correia, V Niculae, AFT Martins - arxiv preprint arxiv:1909.00015, 2019 - arxiv.org
Attention mechanisms have become ubiquitous in NLP. Recent architectures, notably the
Transformer, learn powerful context-aware word representations through layered, multi …

Sequential latent knowledge selection for knowledge-grounded dialogue

B Kim, J Ahn, G Kim - arxiv preprint arxiv:2002.07510, 2020 - arxiv.org
Knowledge-grounded dialogue is a task of generating an informative response based on
both discourse context and external knowledge. As we focus on better modeling the …

Explicit sparse transformer: Concentrated attention through explicit selection

G Zhao, J Lin, Z Zhang, X Ren, Q Su, X Sun - arxiv preprint arxiv …, 2019 - arxiv.org
Self-attention based Transformer has demonstrated the state-of-the-art performances in a
number of natural language processing tasks. Self-attention is able to model long-term …