Faith and fate: Limits of transformers on compositionality

N Dziri, X Lu, M Sclar, XL Li, L Jiang… - Advances in …, 2023‏ - proceedings.neurips.cc
Transformer large language models (LLMs) have sparked admiration for their exceptional
performance on tasks that demand intricate multi-step reasoning. Yet, these models …

Exploring length generalization in large language models

C Anil, Y Wu, A Andreassen… - Advances in …, 2022‏ - proceedings.neurips.cc
The ability to extrapolate from short problem instances to longer ones is an important form of
out-of-distribution generalization in reasoning tasks, and is crucial when learning from …

A taxonomy and review of generalization research in NLP

D Hupkes, M Giulianelli, V Dankers, M Artetxe… - Nature Machine …, 2023‏ - nature.com
The ability to generalize well is one of the primary desiderata for models of natural language
processing (NLP), but what 'good generalization'entails and how it should be evaluated is …

Foundation models for music: A survey

Y Ma, A Øland, A Ragni, BMS Del Sette, C Saitis… - arxiv preprint arxiv …, 2024‏ - arxiv.org
In recent years, foundation models (FMs) such as large language models (LLMs) and latent
diffusion models (LDMs) have profoundly impacted diverse sectors, including music. This …

Efficient methods for natural language processing: A survey

M Treviso, JU Lee, T Ji, B Aken, Q Cao… - Transactions of the …, 2023‏ - direct.mit.edu
Recent work in natural language processing (NLP) has yielded appealing results from
scaling model parameters and training data; however, using only scale to improve …

Compositionality decomposed: How do neural networks generalise?

D Hupkes, V Dankers, M Mul, E Bruni - Journal of Artificial Intelligence …, 2020‏ - jair.org
Despite a multitude of empirical studies, little consensus exists on whether neural networks
are able to generalise compositionally, a controversy that, in part, stems from a lack of …

Transformers can achieve length generalization but not robustly

Y Zhou, U Alon, X Chen, X Wang, R Agarwal… - arxiv preprint arxiv …, 2024‏ - arxiv.org
Length generalization, defined as the ability to extrapolate from shorter training sequences
to longer test ones, is a significant challenge for language models. This issue persists even …

Functional interpolation for relative positions improves long context transformers

S Li, C You, G Guruganesh, J Ainslie… - arxiv preprint arxiv …, 2023‏ - arxiv.org
Preventing the performance decay of Transformers on inputs longer than those used for
training has been an important challenge in extending the context length of these models …

State-of-the-art generalisation research in NLP: a taxonomy and review

D Hupkes, M Giulianelli, V Dankers, M Artetxe… - arxiv preprint arxiv …, 2022‏ - arxiv.org
The ability to generalise well is one of the primary desiderata of natural language
processing (NLP). Yet, what'good generalisation'entails and how it should be evaluated is …

Length generalization in arithmetic transformers

S Jelassi, S d'Ascoli, C Domingo-Enrich, Y Wu… - arxiv preprint arxiv …, 2023‏ - arxiv.org
We examine how transformers cope with two challenges: learning basic integer arithmetic,
and generalizing to longer sequences than seen during training. We find that relative …