Trained transformers learn linear models in-context

R Zhang, S Frei, PL Bartlett - Journal of Machine Learning Research, 2024 - jmlr.org
Attention-based neural networks such as transformers have demonstrated a remarkable
ability to exhibit in-context learning (ICL): Given a short prompt sequence of tokens from an …

Tinystories: How small can language models be and still speak coherent english?

R Eldan, Y Li - arxiv preprint arxiv:2305.07759, 2023 - arxiv.org
Language models (LMs) are powerful tools for natural language processing, but they often
struggle to produce coherent and fluent text when they are small. Models with around 125M …

Scan and snap: Understanding training dynamics and token composition in 1-layer transformer

Y Tian, Y Wang, B Chen, SS Du - Advances in neural …, 2023 - proceedings.neurips.cc
Transformer architecture has shown impressive performance in multiple research domains
and has become the backbone of many neural network models. However, there is limited …

Birth of a transformer: A memory viewpoint

A Bietti, V Cabannes, D Bouchacourt… - Advances in …, 2023 - proceedings.neurips.cc
Large language models based on transformers have achieved great empirical successes.
However, as they are deployed more widely, there is a growing need to better understand …

How transformers learn causal structure with gradient descent

E Nichani, A Damian, JD Lee - arxiv preprint arxiv:2402.14735, 2024 - arxiv.org
The incredible success of transformers on sequence modeling tasks can be largely
attributed to the self-attention mechanism, which allows information to be transferred …

Exposing attention glitches with flip-flop language modeling

B Liu, J Ash, S Goel… - Advances in Neural …, 2023 - proceedings.neurips.cc
Why do large language models sometimes output factual inaccuracies and exhibit
erroneous reasoning? The brittleness of these models, particularly when executing long …

In-context learning with transformers: Softmax attention adapts to function lipschitzness

L Collins, A Parulekar, A Mokhtari… - Advances in …, 2025 - proceedings.neurips.cc
A striking property of transformers is their ability to perform in-context learning (ICL), a
machine learning framework in which the learner is presented with a novel context during …

Towards best practices of activation patching in language models: Metrics and methods

F Zhang, N Nanda - arxiv preprint arxiv:2309.16042, 2023 - arxiv.org
Mechanistic interpretability seeks to understand the internal mechanisms of machine
learning models, where localization--identifying the important model components--is a key …

Joma: Demystifying multilayer transformers via joint dynamics of mlp and attention

Y Tian, Y Wang, Z Zhang, B Chen, S Du - arxiv preprint arxiv:2310.00535, 2023 - arxiv.org
We propose Joint MLP/Attention (JoMA) dynamics, a novel mathematical framework to
understand the training procedure of multilayer Transformer architectures. This is achieved …