Attention in natural language processing

A Galassi, M Lippi, P Torroni - IEEE transactions on neural …, 2020 - ieeexplore.ieee.org
Attention is an increasingly popular mechanism used in a wide range of neural
architectures. The mechanism itself has been realized in a variety of formats. However …

Energy and policy considerations for modern deep learning research

E Strubell, A Ganesh, A McCallum - … of the AAAI conference on artificial …, 2020 - ojs.aaai.org
The field of artificial intelligence has experienced a dramatic methodological shift towards
large neural networks trained on plentiful data. This shift has been fueled by recent …

What does bert look at? an analysis of bert's attention

K Clark, U Khandelwal, O Levy, CD Manning - arxiv preprint arxiv …, 2019 - arxiv.org
Large pre-trained neural networks such as BERT have had great recent success in NLP,
motivating a growing body of research investigating what aspects of language they are able …

Multimodal transformer for unaligned multimodal language sequences

YHH Tsai, S Bai, PP Liang, JZ Kolter… - Proceedings of the …, 2019 - pmc.ncbi.nlm.nih.gov
Human language is often multimodal, which comprehends a mixture of natural language,
facial gestures, and acoustic behaviors. However, two major challenges in modeling such …

Are sixteen heads really better than one?

P Michel, O Levy, G Neubig - Advances in neural …, 2019 - proceedings.neurips.cc
Multi-headed attention is a driving force behind recent state-of-the-art NLP models. By
applying multiple attention mechanisms in parallel, it can express sophisticated functions …

Attention, please! A survey of neural attention models in deep learning

A de Santana Correia, EL Colombini - Artificial Intelligence Review, 2022 - Springer
In humans, Attention is a core property of all perceptual and cognitive operations. Given our
limited ability to process competing sources, attention mechanisms select, modulate, and …

Natural language processing advancements by deep learning: A survey

A Torfi, RA Shirvani, Y Keneshloo, N Tavaf… - arxiv preprint arxiv …, 2020 - arxiv.org
Natural Language Processing (NLP) helps empower intelligent machines by enhancing a
better understanding of the human language for linguistic-based human-computer …

What do you learn from context? probing for sentence structure in contextualized word representations

I Tenney, P **a, B Chen, A Wang, A Poliak… - arxiv preprint arxiv …, 2019 - arxiv.org
Contextualized representation models such as ELMo (Peters et al., 2018a) and BERT
(Devlin et al., 2018) have recently achieved state-of-the-art results on a diverse array of …

Simple bert models for relation extraction and semantic role labeling

P Shi, J Lin - arxiv preprint arxiv:1904.05255, 2019 - arxiv.org
We present simple BERT-based models for relation extraction and semantic role labeling. In
recent years, state-of-the-art performance has been achieved using neural models by …

HIBERT: Document level pre-training of hierarchical bidirectional transformers for document summarization

X Zhang, F Wei, M Zhou - arxiv preprint arxiv:1905.06566, 2019 - arxiv.org
Neural extractive summarization models usually employ a hierarchical encoder for
document encoding and they are trained using sentence-level labels, which are created …