- Academic Search

PL Bartlett, A Montanari, A Rakhlin - Acta numerica, 2021 - cambridge.org

The remarkable practical success of deep learning has revealed some major surprises from
a theoretical perspective. In particular, simple gradient methods easily find near-optimal …

Salva Cita Citato da 383 Articoli correlati Tutte e 12 le versioni

[Free GPT-4]

[PDF] neurips.cc

Dataset distillation with infinitely wide convolutional networks

T Nguyen, R Novak, L **ao… - Advances in Neural …, 2021 - proceedings.neurips.cc

The effectiveness of machine learning algorithms arises from being able to extract useful
features from large amounts of data. As model and dataset sizes increase, dataset …

Salva Cita Citato da 276 Articoli correlati Tutte e 8 le versioni Versione HTML

[Free GPT-4]

[PDF] neurips.cc

High-dimensional asymptotics of feature learning: How one gradient step improves the representation

J Ba, MA Erdogdu, T Suzuki, Z Wang… - Advances in Neural …, 2022 - proceedings.neurips.cc

We study the first gradient descent step on the first-layer parameters $\boldsymbol {W} $ in a
two-layer neural network: $ f (\boldsymbol {x})=\frac {1}{\sqrt {N}}\boldsymbol {a}^\top\sigma …

Salva Cita Citato da 151 Articoli correlati Tutte e 9 le versioni Versione HTML

[Free GPT-4]

[PDF] neurips.cc

Gradient starvation: A learning proclivity in neural networks

M Pezeshki, O Kaba, Y Bengio… - Advances in …, 2021 - proceedings.neurips.cc

We identify and formalize a fundamental gradient descent phenomenon resulting in a
learning proclivity in over-parameterized neural networks. Gradient Starvation arises when …

Salva Cita Citato da 312 Articoli correlati Tutte e 7 le versioni Versione HTML

[Free GPT-4]

[PDF] neurips.cc

What can a single attention layer learn? a study through the random features lens

H Fu, T Guo, Y Bai, S Mei - Advances in Neural Information …, 2024 - proceedings.neurips.cc

Attention layers---which map a sequence of inputs to a sequence of outputs---are core
building blocks of the Transformer architecture which has achieved significant …

Salva Cita Citato da 29 Articoli correlati Tutte e 6 le versioni Versione HTML

[Free GPT-4]

[PDF] arxiv.org

Trak: Attributing model behavior at scale

SM Park, K Georgiev, A Ilyas, G Leclerc… - arxiv preprint arxiv …, 2023 - arxiv.org

The goal of data attribution is to trace model predictions back to training data. Despite a long
line of work towards this goal, existing approaches to data attribution tend to force users to …

Salva Cita Citato da 128 Articoli correlati Tutte e 5 le versioni Versione HTML

[Free GPT-4]

[PDF] arxiv.org

Llm inference unveiled: Survey and roofline model insights

Z Yuan, Y Shang, Y Zhou, Z Dong, Z Zhou… - arxiv preprint arxiv …, 2024 - arxiv.org

The field of efficient Large Language Model (LLM) inference is rapidly evolving, presenting a
unique blend of opportunities and challenges. Although the field has expanded and is …

Salva Cita Citato da 57 Articoli correlati Tutte e 2 le versioni Versione HTML

[Free GPT-4]

[PDF] neurips.cc

Finite versus infinite neural networks: an empirical study

J Lee, S Schoenholz, J Pennington… - Advances in …, 2020 - proceedings.neurips.cc

We perform a careful, thorough, and large scale empirical study of the correspondence
between wide neural networks and kernel methods. By doing so, we resolve a variety of …

Salva Cita Citato da 238 Articoli correlati Tutte e 8 le versioni Versione HTML

[Free GPT-4]

[PDF] arxiv.org

A primer on Bayesian neural networks: review and debates

J Arbel, K Pitas, M Vladimirova, V Fortuin - arxiv preprint arxiv:2309.16314, 2023 - arxiv.org

Neural networks have achieved remarkable performance across various problem domains,
but their widespread applicability is hindered by inherent limitations such as overconfidence …

Salva Cita Citato da 22 Articoli correlati Tutte e 2 le versioni Versione HTML

[Free GPT-4]

[PDF] neurips.cc

Bayesian deep ensembles via the neural tangent kernel

B He, B Lakshminarayanan… - Advances in neural …, 2020 - proceedings.neurips.cc

We explore the link between deep ensembles and Gaussian processes (GPs) through the
lens of the Neural Tangent Kernel (NTK): a recent development in understanding the …

Salva Cita Citato da 142 Articoli correlati Tutte e 6 le versioni Versione HTML

Crea avviso

Cita

Ricerca avanzata

Salvato in La mia biblioteca

Dynamics of deep neural networks and neural tangent hierarchy

Deep learning: a statistical viewpoint

Dataset distillation with infinitely wide convolutional networks

High-dimensional asymptotics of feature learning: How one gradient step improves the representation

Gradient starvation: A learning proclivity in neural networks

What can a single attention layer learn? a study through the random features lens

Trak: Attributing model behavior at scale

Llm inference unveiled: Survey and roofline model insights

Finite versus infinite neural networks: an empirical study

A primer on Bayesian neural networks: review and debates

Bayesian deep ensembles via the neural tangent kernel