Faith and fate: Limits of transformers on compositionality

N Dziri, X Lu, M Sclar, XL Li, L Jiang… - Advances in …, 2024 - proceedings.neurips.cc
Transformer large language models (LLMs) have sparked admiration for their exceptional
performance on tasks that demand intricate multi-step reasoning. Yet, these models …

Generative learning for nonlinear dynamics

W Gilpin - Nature Reviews Physics, 2024 - nature.com
Modern generative machine learning models are able to create realistic outputs far beyond
their training data, such as photorealistic artwork, accurate protein structures or …

Towards revealing the mystery behind chain of thought: a theoretical perspective

G Feng, B Zhang, Y Gu, H Ye, D He… - Advances in Neural …, 2023 - proceedings.neurips.cc
Recent studies have discovered that Chain-of-Thought prompting (CoT) can dramatically
improve the performance of Large Language Models (LLMs), particularly when dealing with …

Transformers as statisticians: Provable in-context learning with in-context algorithm selection

Y Bai, F Chen, H Wang, C **ong… - Advances in neural …, 2023 - proceedings.neurips.cc
Neural sequence models based on the transformer architecture have demonstrated
remarkable\emph {in-context learning}(ICL) abilities, where they can perform new tasks …

Representational strengths and limitations of transformers

C Sanford, DJ Hsu, M Telgarsky - Advances in Neural …, 2024 - proceedings.neurips.cc
Attention layers, as commonly used in transformers, form the backbone of modern deep
learning, yet there is no mathematical description of their benefits and deficiencies as …

Hidden progress in deep learning: Sgd learns parities near the computational limit

B Barak, B Edelman, S Goel… - Advances in …, 2022 - proceedings.neurips.cc
There is mounting evidence of emergent phenomena in the capabilities of deep learning
methods as we scale up datasets, model sizes, and training times. While there are some …

Looped transformers as programmable computers

A Giannou, S Rajput, J Sohn, K Lee… - International …, 2023 - proceedings.mlr.press
We present a framework for using transformer networks as universal computers by
programming them with specific weights and placing them in a loop. Our input sequence …

Trained transformers learn linear models in-context

R Zhang, S Frei, PL Bartlett - arxiv preprint arxiv:2306.09927, 2023 - arxiv.org
Attention-based neural networks such as transformers have demonstrated a remarkable
ability to exhibit in-context learning (ICL): Given a short prompt sequence of tokens from an …