Dissociating language and thought in large language models
Large language models (LLMs) have come closest among all models to date to mastering
human language, yet opinions about their linguistic and cognitive capabilities remain split …
human language, yet opinions about their linguistic and cognitive capabilities remain split …
Probing classifiers: Promises, shortcomings, and advances
Y Belinkov - Computational Linguistics, 2022 - direct.mit.edu
Probing classifiers have emerged as one of the prominent methodologies for interpreting
and analyzing deep neural network models of natural language processing. The basic idea …
and analyzing deep neural network models of natural language processing. The basic idea …
[PDF][PDF] What Does Bert Look At? An Analysis of Bert's Attention
K Clark - arxiv preprint arxiv:1906.04341, 2019 - fq.pkwyx.com
Large pre-trained neural networks such as BERT have had great recent success in NLP,
motivating a growing body of research investigating what aspects of language they are able …
motivating a growing body of research investigating what aspects of language they are able …
What BERT is not: Lessons from a new suite of psycholinguistic diagnostics for language models
A Ettinger - Transactions of the Association for Computational …, 2020 - direct.mit.edu
Pre-training by language modeling has become a popular and successful approach to NLP
tasks, but we have yet to understand exactly what linguistic capacities these pre-training …
tasks, but we have yet to understand exactly what linguistic capacities these pre-training …
How does GPT-2 compute greater-than?: Interpreting mathematical abilities in a pre-trained language model
Pre-trained language models can be surprisingly adept at tasks they were not explicitly
trained on, but how they implement these capabilities is poorly understood. In this paper, we …
trained on, but how they implement these capabilities is poorly understood. In this paper, we …
Designing and interpreting probes with control tasks
Probes, supervised models trained to predict properties (like parts-of-speech) from
representations (like ELMo), have achieved high accuracy on a range of linguistic tasks. But …
representations (like ELMo), have achieved high accuracy on a range of linguistic tasks. But …
Analysis methods in neural language processing: A survey
The field of natural language processing has seen impressive progress in recent years, with
neural network models replacing many of the traditional systems. A plethora of new models …
neural network models replacing many of the traditional systems. A plethora of new models …
Masked language modeling and the distributional hypothesis: Order word matters pre-training for little
A possible explanation for the impressive performance of masked language model (MLM)
pre-training is that such models have learned to represent the syntactic structures prevalent …
pre-training is that such models have learned to represent the syntactic structures prevalent …
Interpreting graph neural networks for NLP with differentiable edge masking
Graph neural networks (GNNs) have become a popular approach to integrating structural
inductive biases into NLP models. However, there has been little work on interpreting them …
inductive biases into NLP models. However, there has been little work on interpreting them …
Compositionality decomposed: How do neural networks generalise?
Despite a multitude of empirical studies, little consensus exists on whether neural networks
are able to generalise compositionally, a controversy that, in part, stems from a lack of …
are able to generalise compositionally, a controversy that, in part, stems from a lack of …