Pretrained transformer efficiently learns low-dimensional target functions in-context

K Oko, Y Song, T Suzuki, D Wu - Advances in Neural …, 2025 - proceedings.neurips.cc
Transformers can efficiently learn in-context from example demonstrations. Most existing
theoretical analyses studied the in-context learning (ICL) ability of transformers for linear …

The computational complexity of learning gaussian single-index models

A Damian, L Pillaud-Vivien, JD Lee, J Bruna - arxiv preprint arxiv …, 2024 - arxiv.org
Single-Index Models are high-dimensional regression problems with planted structure,
whereby labels depend on an unknown one-dimensional projection of the input via a …

On the complexity of learning sparse functions with statistical and gradient queries

N Joshi, T Misiakiewicz, N Srebro - arxiv preprint arxiv:2407.05622, 2024 - arxiv.org
The goal of this paper is to investigate the complexity of gradient algorithms when learning
sparse functions (juntas). We introduce a type of Statistical Queries ($\mathsf {SQ} $), which …

Repetita iuvant: Data repetition allows sgd to learn high-dimensional multi-index functions

L Arnaboldi, Y Dandi, F Krzakala, L Pesce… - arxiv preprint arxiv …, 2024 - arxiv.org
Neural networks can identify low-dimensional relevant structures within high-dimensional
noisy data, yet our mathematical understanding of how they do so remains scarce. Here, we …

Learning sum of diverse features: computational hardness and efficient gradient-based training for ridge combinations

K Oko, Y Song, T Suzuki, D Wu - arxiv preprint arxiv:2406.11828, 2024 - arxiv.org
We study the computational and sample complexity of learning a target function $
f_*:\mathbb {R}^ d\to\mathbb {R} $ with additive structure, that is, $ f_*(x)=\frac {1}{\sqrt …

Learning orthogonal multi-index models: A fine-grained information exponent analysis

Y Ren, JD Lee - arxiv preprint arxiv:2410.09678, 2024 - arxiv.org
The information exponent (Ben Arous et al.[2021])--which is equivalent to the lowest degree
in the Hermite expansion of the link function for Gaussian single-index models--has played …

A random matrix theory perspective on the spectrum of learned features and asymptotic generalization capabilities

Y Dandi, L Pesce, H Cui, F Krzakala, YM Lu… - arxiv preprint arxiv …, 2024 - arxiv.org
A key property of neural networks is their capacity of adapting to data during training. Yet,
our current mathematical understanding of feature learning and its relationship to …

Learning gaussian multi-index models with gradient flow: Time complexity and directional convergence

B Simsek, A Bendjeddou, D Hsu - arxiv preprint arxiv:2411.08798, 2024 - arxiv.org
This work focuses on the gradient flow dynamics of a neural network model that uses
correlation loss to approximate a multi-index function on high-dimensional standard …

Neural network learns low-dimensional polynomials with SGD near the information-theoretic limit

JD Lee, K Oko, T Suzuki, D Wu - arxiv preprint arxiv:2406.01581, 2024 - arxiv.org
We study the problem of gradient descent learning of a single-index target function $
f_*(\boldsymbol {x})=\textstyle\sigma_*\left (\langle\boldsymbol {x},\boldsymbol …

Gradient dynamics for low-rank fine-tuning beyond kernels

AK Dayi, S Chen - arxiv preprint arxiv:2411.15385, 2024 - arxiv.org
LoRA has emerged as one of the de facto methods for fine-tuning foundation models with
low computational cost and memory footprint. The idea is to only train a low-rank …