Analysis methods in neural language processing: A survey

Y Belinkov, J Glass - … of the Association for Computational Linguistics, 2019 - direct.mit.edu
The field of natural language processing has seen impressive progress in recent years, with
neural network models replacing many of the traditional systems. A plethora of new models …

Theoretical limitations of self-attention in neural sequence models

M Hahn - Transactions of the Association for Computational …, 2020 - direct.mit.edu
Transformers are emerging as the new workhorse of NLP, showing great success across
tasks. Unlike LSTMs, transformers process input sequences entirely through self-attention …

Thinking like transformers

G Weiss, Y Goldberg, E Yahav - International Conference on …, 2021 - proceedings.mlr.press
What is the computational model behind a Transformer? Where recurrent neural networks
have direct parallels in finite state machines, allowing clear discussion and thought around …

Self-attention networks can process bounded hierarchical languages

S Yao, B Peng, C Papadimitriou… - arxiv preprint arxiv …, 2021 - arxiv.org
Despite their impressive performance in NLP, self-attention networks were recently proved
to be limited for processing formal languages with hierarchical structure, such as $\mathsf …

Do neural models learn systematicity of monotonicity inference in natural language?

H Yanaka, K Mineshima, D Bekki, K Inui - arxiv preprint arxiv:2004.14839, 2020 - arxiv.org
Despite the success of language models using neural networks, it remains unclear to what
extent neural models have the generalization ability to perform inferences. In this paper, we …

How can self-attention networks recognize Dyck-n languages?

J Ebrahimi, D Gelda, W Zhang - arxiv preprint arxiv:2010.04303, 2020 - arxiv.org
We focus on the recognition of Dyck-n ($\mathcal {D} _n $) languages with self-attention
(SA) networks, which has been deemed to be a difficult task for these networks. We compare …

Evaluating the ability of LSTMs to learn context-free grammars

L Sennhauser, RC Berwick - arxiv preprint arxiv:1811.02611, 2018 - arxiv.org
While long short-term memory (LSTM) neural net architectures are designed to capture
sequence information, human language is generally composed of hierarchical structures …

Memory-augmented recurrent neural networks can learn generalized dyck languages

M Suzgun, S Gehrmann, Y Belinkov… - arxiv preprint arxiv …, 2019 - arxiv.org
We introduce three memory-augmented Recurrent Neural Networks (MARNNs) and explore
their capabilities on a series of simple language modeling tasks whose solutions require …

Formal and empirical studies of counting behaviour in ReLU RNNs

N El-Naggar, A Ryzhikov, L Daviaud… - International …, 2023 - proceedings.mlr.press
In recent years, the discussion about systematicity of neural network learning has gained
renewed interest, in particular the formal analysis of neural network behaviour. In this paper …

Learning the Dyck language with attention-based Seq2Seq models

X Yu, NT Vu, J Kuhn - Proceedings of the 2019 ACL Workshop …, 2019 - aclanthology.org
The generalized Dyck language has been used to analyze the ability of Recurrent Neural
Networks (RNNs) to learn context-free grammars (CFGs). Recent studies draw conflicting …