Learning sum of diverse features: computational hardness and efficient gradient-based training for ridge combinations
We study the computational and sample complexity of learning a target function $
f_*:\mathbb {R}^ d\to\mathbb {R} $ with additive structure, that is, $ f_*(x)=\frac {1}{\sqrt …
f_*:\mathbb {R}^ d\to\mathbb {R} $ with additive structure, that is, $ f_*(x)=\frac {1}{\sqrt …
Learning gaussian multi-index models with gradient flow: Time complexity and directional convergence
This work focuses on the gradient flow dynamics of a neural network model that uses
correlation loss to approximate a multi-index function on high-dimensional standard …
correlation loss to approximate a multi-index function on high-dimensional standard …
Neural network learns low-dimensional polynomials with SGD near the information-theoretic limit
We study the problem of gradient descent learning of a single-index target function $
f_*(\boldsymbol {x})=\textstyle\sigma_*\left (\langle\boldsymbol {x},\boldsymbol …
f_*(\boldsymbol {x})=\textstyle\sigma_*\left (\langle\boldsymbol {x},\boldsymbol …
Pretrained transformer efficiently learns low-dimensional target functions in-context
Transformers can efficiently learn in-context from example demonstrations. Most existing
theoretical analyses studied the in-context learning (ICL) ability of transformers for linear …
theoretical analyses studied the in-context learning (ICL) ability of transformers for linear …
Learning multi-index models with neural networks via mean-field langevin dynamics
We study the problem of learning multi-index models in high-dimensions using a two-layer
neural network trained with the mean-field Langevin algorithm. Under mild distributional …
neural network trained with the mean-field Langevin algorithm. Under mild distributional …
Gradient dynamics for low-rank fine-tuning beyond kernels
LoRA has emerged as one of the de facto methods for fine-tuning foundation models with
low computational cost and memory footprint. The idea is to only train a low-rank …
low computational cost and memory footprint. The idea is to only train a low-rank …
Fundamental limits of learning in sequence multi-index models and deep attention networks: High-dimensional asymptotics and sharp thresholds
In this manuscript, we study the learning of deep attention neural networks, defined as the
composition of multiple self-attention layers, with tied and low-rank weights. We first …
composition of multiple self-attention layers, with tied and low-rank weights. We first …
On The Concurrence of Layer-wise Preconditioning Methods and Provable Feature Learning
Layer-wise preconditioning methods are a family of memory-efficient optimization algorithms
that introduce preconditioners per axis of each layer's weight tensors. These methods have …
that introduce preconditioners per axis of each layer's weight tensors. These methods have …
Robust Feature Learning for Multi-Index Models in High Dimensions
Recently, there have been numerous studies on feature learning with neural networks,
specifically on learning single-and multi-index models where the target is a function of a low …
specifically on learning single-and multi-index models where the target is a function of a low …
Optimal Spectral Transitions in High-Dimensional Multi-Index Models
We consider the problem of how many samples from a Gaussian multi-index model are
required to weakly reconstruct the relevant index subspace. Despite its increasing popularity …
required to weakly reconstruct the relevant index subspace. Despite its increasing popularity …