Dissociating language and thought in large language models
Large language models (LLMs) have come closest among all models to date to mastering
human language, yet opinions about their linguistic and cognitive capabilities remain split …
human language, yet opinions about their linguistic and cognitive capabilities remain split …
Recent advances in natural language processing via large pre-trained language models: A survey
Large, pre-trained language models (PLMs) such as BERT and GPT have drastically
changed the Natural Language Processing (NLP) field. For numerous NLP tasks …
changed the Natural Language Processing (NLP) field. For numerous NLP tasks …
Beyond the imitation game: Quantifying and extrapolating the capabilities of language models
Language models demonstrate both quantitative improvement and new qualitative
capabilities with increasing scale. Despite their potentially transformative impact, these new …
capabilities with increasing scale. Despite their potentially transformative impact, these new …
[HTML][HTML] Modern language models refute Chomsky's approach to language
ST Piantadosi - From fieldwork to linguistic theory: A tribute to …, 2023 - books.google.com
Modern machine learning has subverted and bypassed the theoretical framework of
Chomsky's generative approach to linguistics, including its core claims to particular insights …
Chomsky's generative approach to linguistics, including its core claims to particular insights …
Winoground: Probing vision and language models for visio-linguistic compositionality
We present a novel task and dataset for evaluating the ability of vision and language models
to conduct visio-linguistic compositional reasoning, which we call Winoground. Given two …
to conduct visio-linguistic compositional reasoning, which we call Winoground. Given two …
A survey of data augmentation approaches for NLP
Data augmentation has recently seen increased interest in NLP due to more work in low-
resource domains, new tasks, and the popularity of large-scale neural networks that require …
resource domains, new tasks, and the popularity of large-scale neural networks that require …
The learnability of in-context learning
In-context learning is a surprising and important phenomenon that emerged when modern
language models were scaled to billions of learned parameters. Without modifying a large …
language models were scaled to billions of learned parameters. Without modifying a large …
[PDF][PDF] What Does Bert Look At? An Analysis of Bert's Attention
K Clark - arxiv preprint arxiv:1906.04341, 2019 - fq.pkwyx.com
Large pre-trained neural networks such as BERT have had great recent success in NLP,
motivating a growing body of research investigating what aspects of language they are able …
motivating a growing body of research investigating what aspects of language they are able …
Masked language modeling and the distributional hypothesis: Order word matters pre-training for little
A possible explanation for the impressive performance of masked language model (MLM)
pre-training is that such models have learned to represent the syntactic structures prevalent …
pre-training is that such models have learned to represent the syntactic structures prevalent …
Analyzing multi-head self-attention: Specialized heads do the heavy lifting, the rest can be pruned
Multi-head self-attention is a key component of the Transformer, a state-of-the-art
architecture for neural machine translation. In this work we evaluate the contribution made …
architecture for neural machine translation. In this work we evaluate the contribution made …