Learning sum of diverse features: computational hardness and efficient gradient-based training for ridge combinations

K Oko, Y Song, T Suzuki, D Wu - arxiv preprint arxiv:2406.11828, 2024 - arxiv.org
We study the computational and sample complexity of learning a target function $
f_*:\mathbb {R}^ d\to\mathbb {R} $ with additive structure, that is, $ f_*(x)=\frac {1}{\sqrt …

Learning gaussian multi-index models with gradient flow: Time complexity and directional convergence

B Simsek, A Bendjeddou, D Hsu - arxiv preprint arxiv:2411.08798, 2024 - arxiv.org
This work focuses on the gradient flow dynamics of a neural network model that uses
correlation loss to approximate a multi-index function on high-dimensional standard …

Neural network learns low-dimensional polynomials with SGD near the information-theoretic limit

JD Lee, K Oko, T Suzuki, D Wu - arxiv preprint arxiv:2406.01581, 2024 - arxiv.org
We study the problem of gradient descent learning of a single-index target function $
f_*(\boldsymbol {x})=\textstyle\sigma_*\left (\langle\boldsymbol {x},\boldsymbol …

Pretrained transformer efficiently learns low-dimensional target functions in-context

K Oko, Y Song, T Suzuki, D Wu - arxiv preprint arxiv:2411.02544, 2024 - arxiv.org
Transformers can efficiently learn in-context from example demonstrations. Most existing
theoretical analyses studied the in-context learning (ICL) ability of transformers for linear …

Learning multi-index models with neural networks via mean-field langevin dynamics

A Mousavi-Hosseini, D Wu, MA Erdogdu - arxiv preprint arxiv:2408.07254, 2024 - arxiv.org
We study the problem of learning multi-index models in high-dimensions using a two-layer
neural network trained with the mean-field Langevin algorithm. Under mild distributional …

Gradient dynamics for low-rank fine-tuning beyond kernels

AK Dayi, S Chen - arxiv preprint arxiv:2411.15385, 2024 - arxiv.org
LoRA has emerged as one of the de facto methods for fine-tuning foundation models with
low computational cost and memory footprint. The idea is to only train a low-rank …

Fundamental limits of learning in sequence multi-index models and deep attention networks: High-dimensional asymptotics and sharp thresholds

E Troiani, H Cui, Y Dandi, F Krzakala… - arxiv preprint arxiv …, 2025 - arxiv.org
In this manuscript, we study the learning of deep attention neural networks, defined as the
composition of multiple self-attention layers, with tied and low-rank weights. We first …

On The Concurrence of Layer-wise Preconditioning Methods and Provable Feature Learning

TT Zhang, B Moniri, A Nagwekar, F Rahman… - arxiv preprint arxiv …, 2025 - arxiv.org
Layer-wise preconditioning methods are a family of memory-efficient optimization algorithms
that introduce preconditioners per axis of each layer's weight tensors. These methods have …

Robust Feature Learning for Multi-Index Models in High Dimensions

A Mousavi-Hosseini, A Javanmard… - arxiv preprint arxiv …, 2024 - arxiv.org
Recently, there have been numerous studies on feature learning with neural networks,
specifically on learning single-and multi-index models where the target is a function of a low …

Optimal Spectral Transitions in High-Dimensional Multi-Index Models

L Defilippis, Y Dandi, P Mergny, F Krzakala… - arxiv preprint arxiv …, 2025 - arxiv.org
We consider the problem of how many samples from a Gaussian multi-index model are
required to weakly reconstruct the relevant index subspace. Despite its increasing popularity …