- Academic Search

K Oko, Y Song, T Suzuki, D Wu - Advances in Neural …, 2025 - proceedings.neurips.cc

Transformers can efficiently learn in-context from example demonstrations. Most existing
theoretical analyses studied the in-context learning (ICL) ability of transformers for linear …

Spara Citera Citerat av 1 Relaterade artiklar Alla 5 versionerna Se som HTML-version

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

The computational complexity of learning gaussian single-index models

A Damian, L Pillaud-Vivien, JD Lee, J Bruna - arxiv preprint arxiv …, 2024 - arxiv.org

Single-Index Models are high-dimensional regression problems with planted structure,
whereby labels depend on an unknown one-dimensional projection of the input via a …

Spara Citera Citerat av 15 Relaterade artiklar Alla 2 versionerna Se som HTML-version

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

On the complexity of learning sparse functions with statistical and gradient queries

N Joshi, T Misiakiewicz, N Srebro - arxiv preprint arxiv:2407.05622, 2024 - arxiv.org

The goal of this paper is to investigate the complexity of gradient algorithms when learning
sparse functions (juntas). We introduce a type of Statistical Queries ($\mathsf {SQ} $), which …

Spara Citera Citerat av 5 Relaterade artiklar Alla 4 versionerna Se som HTML-version

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Repetita iuvant: Data repetition allows sgd to learn high-dimensional multi-index functions

L Arnaboldi, Y Dandi, F Krzakala, L Pesce… - arxiv preprint arxiv …, 2024 - arxiv.org

Neural networks can identify low-dimensional relevant structures within high-dimensional
noisy data, yet our mathematical understanding of how they do so remains scarce. Here, we …

Spara Citera Citerat av 12 Relaterade artiklar Alla 3 versionerna Se som HTML-version

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Learning sum of diverse features: computational hardness and efficient gradient-based training for ridge combinations

K Oko, Y Song, T Suzuki, D Wu - arxiv preprint arxiv:2406.11828, 2024 - arxiv.org

We study the computational and sample complexity of learning a target function $
f_*:\mathbb {R}^ d\to\mathbb {R} $ with additive structure, that is, $ f_*(x)=\frac {1}{\sqrt …

Spara Citera Citerat av 11 Relaterade artiklar Se som HTML-version

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Learning orthogonal multi-index models: A fine-grained information exponent analysis

Y Ren, JD Lee - arxiv preprint arxiv:2410.09678, 2024 - arxiv.org

The information exponent (Ben Arous et al.[2021])--which is equivalent to the lowest degree
in the Hermite expansion of the link function for Gaussian single-index models--has played …

Spara Citera Citerat av 3 Relaterade artiklar Alla 2 versionerna Se som HTML-version

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

A random matrix theory perspective on the spectrum of learned features and asymptotic generalization capabilities

Y Dandi, L Pesce, H Cui, F Krzakala, YM Lu… - arxiv preprint arxiv …, 2024 - arxiv.org

A key property of neural networks is their capacity of adapting to data during training. Yet,
our current mathematical understanding of feature learning and its relationship to …

Spara Citera Citerat av 2 Relaterade artiklar Alla 4 versionerna Se som HTML-version

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Learning gaussian multi-index models with gradient flow: Time complexity and directional convergence

B Simsek, A Bendjeddou, D Hsu - arxiv preprint arxiv:2411.08798, 2024 - arxiv.org

This work focuses on the gradient flow dynamics of a neural network model that uses
correlation loss to approximate a multi-index function on high-dimensional standard …

Spara Citera Citerat av 2 Relaterade artiklar Alla 4 versionerna Se som HTML-version

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Neural network learns low-dimensional polynomials with SGD near the information-theoretic limit

JD Lee, K Oko, T Suzuki, D Wu - arxiv preprint arxiv:2406.01581, 2024 - arxiv.org

We study the problem of gradient descent learning of a single-index target function $
f_*(\boldsymbol {x})=\textstyle\sigma_*\left (\langle\boldsymbol {x},\boldsymbol …

Spara Citera Citerat av 17 Relaterade artiklar Alla 3 versionerna Se som HTML-version

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Gradient dynamics for low-rank fine-tuning beyond kernels

AK Dayi, S Chen - arxiv preprint arxiv:2411.15385, 2024 - arxiv.org

LoRA has emerged as one of the de facto methods for fine-tuning foundation models with
low computational cost and memory footprint. The idea is to only train a low-rank …

Spara Citera Citerat av 2 Relaterade artiklar Alla 2 versionerna Se som HTML-version

Skapa alarm

Citera

Avancerad sökning

Har sparats i Mitt bibliotek

The benefits of reusing batches for gradient descent in two-layer networks: Breaking the...

Pretrained transformer efficiently learns low-dimensional target functions in-context

The computational complexity of learning gaussian single-index models

On the complexity of learning sparse functions with statistical and gradient queries

Repetita iuvant: Data repetition allows sgd to learn high-dimensional multi-index functions

Learning sum of diverse features: computational hardness and efficient gradient-based training for ridge combinations

Learning orthogonal multi-index models: A fine-grained information exponent analysis

A random matrix theory perspective on the spectrum of learned features and asymptotic generalization capabilities

Learning gaussian multi-index models with gradient flow: Time complexity and directional convergence

Neural network learns low-dimensional polynomials with SGD near the information-theoretic limit

Gradient dynamics for low-rank fine-tuning beyond kernels