[HTML][HTML] Modern language models refute Chomsky's approach to language

ST Piantadosi - From fieldwork to linguistic theory: A tribute to …, 2023 - books.google.com
Modern machine learning has subverted and bypassed the theoretical framework of
Chomsky's generative approach to linguistics, including its core claims to particular insights …

[HTML][HTML] Pre-trained models: Past, present and future

X Han, Z Zhang, N Ding, Y Gu, X Liu, Y Huo, J Qiu… - AI Open, 2021 - Elsevier
Large-scale pre-trained models (PTMs) such as BERT and GPT have recently achieved
great success and become a milestone in the field of artificial intelligence (AI). Owing to …

A primer in BERTology: What we know about how BERT works

A Rogers, O Kovaleva, A Rumshisky - Transactions of the Association …, 2021 - direct.mit.edu
Transformer-based models have pushed state of the art in many areas of NLP, but our
understanding of what is behind their success is still limited. This paper is the first survey of …

Vision transformers provably learn spatial structure

S Jelassi, M Sander, Y Li - Advances in Neural Information …, 2022 - proceedings.neurips.cc
Abstract Vision Transformers (ViTs) have recently achieved comparable or superior
performance to Convolutional neural networks (CNNs) in computer vision. This empirical …

What artificial neural networks can tell us about human language acquisition

A Warstadt, SR Bowman - Algebraic structures in natural …, 2022 - taylorfrancis.com
Rapid progress in machine learning for natural language processing has the potential to
transform debates about how humans learn language. However, the learning environments …

Learning which features matter: RoBERTa acquires a preference for linguistic generalizations (eventually)

A Warstadt, Y Zhang, HS Li, H Liu… - arxiv preprint arxiv …, 2020 - arxiv.org
One reason pretraining on self-supervised linguistic tasks is effective is that it teaches
models features that are helpful for language understanding. However, we want pretrained …

BabyBERTa: Learning more grammar with small-scale child-directed language

PA Huebner, E Sulem, F Cynthia… - Proceedings of the 25th …, 2021 - aclanthology.org
Transformer-based language models have taken the NLP world by storm. However, their
potential for addressing important questions in language acquisition research has been …

BERTs of a feather do not generalize together: Large variability in generalization across models with similar test set performance

RT McCoy, J Min, T Linzen - arxiv preprint arxiv:1911.02969, 2019 - arxiv.org
If the same neural network architecture is trained multiple times on the same dataset, will it
make similar linguistic generalizations across runs? To study this question, we fine-tuned …

Probing for the usage of grammatical number

K Lasri, T Pimentel, A Lenci, T Poibeau… - arxiv preprint arxiv …, 2022 - arxiv.org
A central quest of probing is to uncover how pre-trained models encode a linguistic property
within their representations. An encoding, however, might be spurious-ie, the model might …

Language models as models of language

R Millière - arxiv preprint arxiv:2408.07144, 2024 - arxiv.org
This chapter critically examines the potential contributions of modern language models to
theoretical linguistics. Despite their focus on engineering goals, these models' ability to …