- Academic Search

K Mahowald, AA Ivanova, IA Blank, N Kanwisher… - Trends in Cognitive …, 2024 - cell.com

Large language models (LLMs) have come closest among all models to date to mastering
human language, yet opinions about their linguistic and cognitive capabilities remain split …

Save Cite Cited by 432 Related articles All 10 versions Free GPT-4

[Free GPT-4]

[PDF] mit.edu

Probing classifiers: Promises, shortcomings, and advances

Y Belinkov - Computational Linguistics, 2022 - direct.mit.edu

Probing classifiers have emerged as one of the prominent methodologies for interpreting
and analyzing deep neural network models of natural language processing. The basic idea …

Save Cite Cited by 451 Related articles All 8 versions Free GPT-4

[Free GPT-4]

[PDF] pkwyx.com

[PDF][PDF] What Does Bert Look At? An Analysis of Bert's Attention

K Clark - arxiv preprint arxiv:1906.04341, 2019 - fq.pkwyx.com

Large pre-trained neural networks such as BERT have had great recent success in NLP,
motivating a growing body of research investigating what aspects of language they are able …

Save Cite Cited by 1937 Related articles Cached

[Free GPT-4]

[PDF] mit.edu

What BERT is not: Lessons from a new suite of psycholinguistic diagnostics for language models

A Ettinger - Transactions of the Association for Computational …, 2020 - direct.mit.edu

Pre-training by language modeling has become a popular and successful approach to NLP
tasks, but we have yet to understand exactly what linguistic capacities these pre-training …

Save Cite Cited by 714 Related articles All 11 versions Free GPT-4

[Free GPT-4]

[PDF] neurips.cc

How does GPT-2 compute greater-than?: Interpreting mathematical abilities in a pre-trained language model

M Hanna, O Liu, A Variengien - Advances in Neural …, 2024 - proceedings.neurips.cc

Pre-trained language models can be surprisingly adept at tasks they were not explicitly
trained on, but how they implement these capabilities is poorly understood. In this paper, we …

Save Cite Cited by 111 Related articles All 5 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Designing and interpreting probes with control tasks

J Hewitt, P Liang - arxiv preprint arxiv:1909.03368, 2019 - arxiv.org

Probes, supervised models trained to predict properties (like parts-of-speech) from
representations (like ELMo), have achieved high accuracy on a range of linguistic tasks. But …

Save Cite Cited by 565 Related articles All 9 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] mit.edu

Analysis methods in neural language processing: A survey

Y Belinkov, J Glass - … of the Association for Computational Linguistics, 2019 - direct.mit.edu

The field of natural language processing has seen impressive progress in recent years, with
neural network models replacing many of the traditional systems. A plethora of new models …

Save Cite Cited by 635 Related articles All 14 versions Free GPT-4

[Free GPT-4]

[PDF] arxiv.org

Masked language modeling and the distributional hypothesis: Order word matters pre-training for little

K Sinha, R Jia, D Hupkes, J Pineau, A Williams… - arxiv preprint arxiv …, 2021 - arxiv.org

A possible explanation for the impressive performance of masked language model (MLM)
pre-training is that such models have learned to represent the syntactic structures prevalent …

Save Cite Cited by 264 Related articles All 5 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Interpreting graph neural networks for NLP with differentiable edge masking

MS Schlichtkrull, N De Cao, I Titov - arxiv preprint arxiv:2010.00577, 2020 - arxiv.org

Graph neural networks (GNNs) have become a popular approach to integrating structural
inductive biases into NLP models. However, there has been little work on interpreting them …

Save Cite Cited by 281 Related articles All 6 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] jair.org

Compositionality decomposed: How do neural networks generalise?

D Hupkes, V Dankers, M Mul, E Bruni - Journal of Artificial Intelligence …, 2020 - jair.org

Despite a multitude of empirical studies, little consensus exists on whether neural networks
are able to generalise compositionally, a controversy that, in part, stems from a lack of …

Save Cite Cited by 359 Related articles All 10 versions Free GPT-4 View as HTML

Create alert

Cite

Advanced search

Saved to My library

Under the hood: Using diagnostic classifiers to investigate and improve how language models...

Dissociating language and thought in large language models

Probing classifiers: Promises, shortcomings, and advances

[PDF][PDF] What Does Bert Look At? An Analysis of Bert's Attention

What BERT is not: Lessons from a new suite of psycholinguistic diagnostics for language models

How does GPT-2 compute greater-than?: Interpreting mathematical abilities in a pre-trained language model

Designing and interpreting probes with control tasks

Analysis methods in neural language processing: A survey

Masked language modeling and the distributional hypothesis: Order word matters pre-training for little

Interpreting graph neural networks for NLP with differentiable edge masking

Compositionality decomposed: How do neural networks generalise?