[PDF][PDF] Language model behavior: A comprehensive survey

TA Chang, BK Bergen - Computational Linguistics, 2024 - direct.mit.edu
Transformer language models have received widespread public attention, yet their
generated text is often surprising even to NLP researchers. In this survey, we discuss over …

Analysis methods in neural language processing: A survey

Y Belinkov, J Glass - … of the Association for Computational Linguistics, 2019 - direct.mit.edu
The field of natural language processing has seen impressive progress in recent years, with
neural network models replacing many of the traditional systems. A plethora of new models …

Beyond the imitation game: Quantifying and extrapolating the capabilities of language models

A Srivastava, A Rastogi, A Rao, AAM Shoeb… - arxiv preprint arxiv …, 2022 - arxiv.org
Language models demonstrate both quantitative improvement and new qualitative
capabilities with increasing scale. Despite their potentially transformative impact, these new …

What artificial neural networks can tell us about human language acquisition

A Warstadt, SR Bowman - Algebraic structures in natural …, 2022 - taylorfrancis.com
Rapid progress in machine learning for natural language processing has the potential to
transform debates about how humans learn language. However, the learning environments …

Right for the wrong reasons: Diagnosing syntactic heuristics in natural language inference

RT McCoy, E Pavlick, T Linzen - arxiv preprint arxiv:1902.01007, 2019 - arxiv.org
A machine learning system can score well on a given test set by relying on heuristics that
are effective for frequent example types but break down in more challenging cases. We …

Syntactic structure from deep learning

T Linzen, M Baroni - Annual Review of Linguistics, 2021 - annualreviews.org
Modern deep neural networks achieve impressive performance in engineering applications
that require extensive linguistic skills, such as machine translation. This success has …

Open sesame: Getting inside BERT's linguistic knowledge

Y Lin, YC Tan, R Frank - arxiv preprint arxiv:1906.01698, 2019 - arxiv.org
How and to what extent does BERT encode syntactically-sensitive hierarchical information
or positionally-sensitive linear information? Recent work has shown that contextual …

What do RNN language models learn about filler-gap dependencies?

E Wilcox, R Levy, T Morita, R Futrell - arxiv preprint arxiv:1809.00042, 2018 - arxiv.org
RNN language models have achieved state-of-the-art perplexity results and have proven
useful in a suite of NLP tasks, but it is as yet unclear what syntactic generalizations they …

Learning which features matter: RoBERTa acquires a preference for linguistic generalizations (eventually)

A Warstadt, Y Zhang, HS Li, H Liu… - arxiv preprint arxiv …, 2020 - arxiv.org
One reason pretraining on self-supervised linguistic tasks is effective is that it teaches
models features that are helpful for language understanding. However, we want pretrained …

BERTs of a feather do not generalize together: Large variability in generalization across models with similar test set performance

RT McCoy, J Min, T Linzen - arxiv preprint arxiv:1911.02969, 2019 - arxiv.org
If the same neural network architecture is trained multiple times on the same dataset, will it
make similar linguistic generalizations across runs? To study this question, we fine-tuned …