Probing classifiers: Promises, shortcomings, and advances

Y Belinkov - Computational Linguistics, 2022 - direct.mit.edu
Probing classifiers have emerged as one of the prominent methodologies for interpreting
and analyzing deep neural network models of natural language processing. The basic idea …

Masked language modeling and the distributional hypothesis: Order word matters pre-training for little

K Sinha, R Jia, D Hupkes, J Pineau, A Williams… - arxiv preprint arxiv …, 2021 - arxiv.org
A possible explanation for the impressive performance of masked language model (MLM)
pre-training is that such models have learned to represent the syntactic structures prevalent …

Implicit representations of meaning in neural language models

BZ Li, M Nye, J Andreas - arxiv preprint arxiv:2106.00737, 2021 - arxiv.org
Does the effectiveness of neural language models derive entirely from accurate modeling of
surface word co-occurrence statistics, or do these models represent and reason about the …

When do you need billions of words of pretraining data?

Y Zhang, A Warstadt, HS Li, SR Bowman - arxiv preprint arxiv:2011.04946, 2020 - arxiv.org
NLP is currently dominated by general-purpose pretrained language models like RoBERTa,
which achieve strong performance on NLU tasks through pretraining on billions of words …

Schrödinger's tree—On syntax and neural language models

A Kulmizev, J Nivre - Frontiers in Artificial Intelligence, 2022 - frontiersin.org
In the last half-decade, the field of natural language processing (NLP) has undergone two
major transitions: the switch to neural networks as the primary modeling paradigm and the …

Can language models encode perceptual structure without grounding? a case study in color

M Abdou, A Kulmizev, D Hershcovich, S Frank… - arxiv preprint arxiv …, 2021 - arxiv.org
Pretrained language models have been shown to encode relational information, such as the
relations between entities or concepts in knowledge-bases--(Paris, Capital, France) …

Sudden drops in the loss: Syntax acquisition, phase transitions, and simplicity bias in MLMs

A Chen, R Shwartz-Ziv, K Cho, ML Leavitt… - arxiv preprint arxiv …, 2023 - arxiv.org
Most interpretability research in NLP focuses on understanding the behavior and features of
a fully trained model. However, certain insights into model behavior may only be accessible …

Word order does matter and shuffled language models know it

M Abdou, V Ravishankar, A Kulmizev… - Proceedings of the 60th …, 2022 - aclanthology.org
Recent studies have shown that language models pretrained and/or fine-tuned on randomly
permuted sentences exhibit competitive performance on GLUE, putting into question the …

Probing for the usage of grammatical number

K Lasri, T Pimentel, A Lenci, T Poibeau… - arxiv preprint arxiv …, 2022 - arxiv.org
A central quest of probing is to uncover how pre-trained models encode a linguistic property
within their representations. An encoding, however, might be spurious-ie, the model might …

Language models as models of language

R Millière - arxiv preprint arxiv:2408.07144, 2024 - arxiv.org
This chapter critically examines the potential contributions of modern language models to
theoretical linguistics. Despite their focus on engineering goals, these models' ability to …