Trained transformers learn linear models in-context

R Zhang, S Frei, PL Bartlett - Journal of Machine Learning Research, 2024 - jmlr.org
Attention-based neural networks such as transformers have demonstrated a remarkable
ability to exhibit in-context learning (ICL): Given a short prompt sequence of tokens from an …

Understanding in-context learning in transformers and llms by learning to learn discrete functions

S Bhattamishra, A Patel, P Blunsom… - arxiv preprint arxiv …, 2023 - arxiv.org
In order to understand the in-context learning phenomenon, recent works have adopted a
stylized experimental framework and demonstrated that Transformers can learn gradient …

A theoretical understanding of self-correction through in-context alignment

Y Wang, Y Wu, Z Wei, S Jegelka, Y Wang - arxiv preprint arxiv …, 2024 - arxiv.org
Going beyond mimicking limited human experiences, recent studies show initial evidence
that, like humans, large language models (LLMs) are capable of improving their abilities …

Drift-resilient tabPFN: In-context learning temporal distribution shifts on tabular data

K Helli, D Schnurr, N Hollmann… - Advances in Neural …, 2025 - proceedings.neurips.cc
While most ML models expect independent and identically distributed data, this assumption
is often violated in real-world scenarios due to distribution shifts, resulting in the degradation …

On mesa-optimization in autoregressively trained transformers: Emergence and capability

C Zheng, W Huang, R Wang, G Wu… - Advances in Neural …, 2025 - proceedings.neurips.cc
Autoregressively trained transformers have brought a profound revolution to the world,
especially with their in-context learning (ICL) ability to address downstream tasks. Recently …

How well does gpt-4v (ision) adapt to distribution shifts? a preliminary investigation

Z Han, G Zhou, R He, J Wang, T Wu, Y Yin… - arxiv preprint arxiv …, 2023 - arxiv.org
In machine learning, generalization against distribution shifts--where deployment conditions
diverge from the training scenarios--is crucial, particularly in fields like climate modeling …

How In-Context Learning Emerges from Training on Unstructured Data: On the Role of Co-Occurrence, Positional Information, and Noise Structures

KC Wibisono, Y Wang - arxiv preprint arxiv:2406.00131, 2024 - arxiv.org
Large language models (LLMs) like transformers have impressive in-context learning (ICL)
capabilities; they can generate predictions for new queries based on input-output …

From Unstructured Data to In-Context Learning: Exploring What Tasks Can Be Learned and When

KC Wibisono, Y Wang - The Thirty-eighth Annual Conference on …, 2024 - openreview.net
Large language models (LLMs) like transformers demonstrate impressive in-context
learning (ICL) capabilities, allowing them to make predictions for new tasks based on prompt …

In-context learning in presence of spurious correlations

H Harutyunyan, R Darbinyan, S Karapetyan… - arxiv preprint arxiv …, 2024 - arxiv.org
Large language models exhibit a remarkable capacity for in-context learning, where they
learn to solve tasks given a few examples. Recent work has shown that transformers can be …

Technical Debt in In-Context Learning: Diminishing Efficiency in Long Context

T Joo, D Klabjan - arxiv preprint arxiv:2502.04580, 2025 - arxiv.org
Transformers have demonstrated remarkable in-context learning (ICL) capabilities, adapting
to new tasks by simply conditioning on demonstrations without parameter updates …