Attention mechanism in neural networks: where it comes and where it goes

D Soydaner - Neural Computing and Applications, 2022 - Springer
A long time ago in the machine learning literature, the idea of incorporating a mechanism
inspired by the human visual system into neural networks was introduced. This idea is …

On the explainability of natural language processing deep models

JE Zini, M Awad - ACM Computing Surveys, 2022 - dl.acm.org
Despite their success, deep networks are used as black-box models with outputs that are not
easily explainable during the learning and the prediction phases. This lack of interpretability …

Quantifying attention flow in transformers

S Abnar, W Zuidema - arxiv preprint arxiv:2005.00928, 2020 - arxiv.org
In the Transformer model," self-attention" combines information from attended embeddings
into the representation of the focal embedding in the next layer. Thus, across layers of the …

Attention is not not explanation

S Wiegreffe, Y Pinter - arxiv preprint arxiv:1908.04626, 2019 - arxiv.org
Attention mechanisms play a central role in NLP systems, especially within recurrent neural
network (RNN) models. Recently, there has been increasing interest in whether or not the …

Attention, please! A survey of neural attention models in deep learning

A de Santana Correia, EL Colombini - Artificial Intelligence Review, 2022 - Springer
In humans, Attention is a core property of all perceptual and cognitive operations. Given our
limited ability to process competing sources, attention mechanisms select, modulate, and …

A multiscale visualization of attention in the transformer model

J Vig - arxiv preprint arxiv:1906.05714, 2019 - arxiv.org
The Transformer is a sequence model that forgoes traditional recurrent architectures in favor
of a fully attention-based approach. Besides improving performance, an advantage of using …

XNLI: Evaluating cross-lingual sentence representations

A Conneau, G Lample, R Rinott, A Williams… - arxiv preprint arxiv …, 2018 - arxiv.org
State-of-the-art natural language processing systems rely on supervision in the form of
annotated data to learn competent models. These models are generally trained on data in a …

[PDF][PDF] Improving language understanding by generative pre-training

A Radford, K Narasimhan, T Salimans, I Sutskever - 2018 - mikecaptain.com
Natural language understanding comprises a wide range of diverse tasks such as textual
entailment, question answering, semantic similarity assessment, and document …

Attention in natural language processing

A Galassi, M Lippi, P Torroni - IEEE transactions on neural …, 2020 - ieeexplore.ieee.org
Attention is an increasingly popular mechanism used in a wide range of neural
architectures. The mechanism itself has been realized in a variety of formats. However …

Adversarial attacks on deep-learning models in natural language processing: A survey

WE Zhang, QZ Sheng, A Alhazmi, C Li - ACM Transactions on Intelligent …, 2020 - dl.acm.org
With the development of high computational devices, deep neural networks (DNNs), in
recent years, have gained significant popularity in many Artificial Intelligence (AI) …