- Academic Search

A Zhang, ZC Lipton, M Li, AJ Smola - arxiv preprint arxiv:2106.11342, 2021 - arxiv.org

This open-source book represents our attempt to make deep learning approachable,
teaching readers the concepts, the context, and the code. The entire book is drafted in …

Enregistrer Citer Cité 1223 fois Autres articles Les 9 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] jmlr.org

Stochastic gradient descent as approximate bayesian inference

M Stephan, MD Hoffman, DM Blei - Journal of Machine Learning …, 2017 - jmlr.org

Stochastic Gradient Descent with a constant learning rate (constant SGD) simulates a
Markov chain with a stationary distribution. With this perspective, we derive several new …

Enregistrer Citer Cité 724 fois Autres articles Les 10 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] pnas.org Full View

A variational perspective on accelerated methods in optimization

A Wibisono, AC Wilson… - proceedings of the …, 2016 - National Acad Sciences

Accelerated gradient methods play a central role in optimization, achieving optimal rates in
many settings. Although many generalizations and extensions of Nesterov's original …

Enregistrer Citer Cité 580 fois Autres articles Les 11 versions Free GPT-4

[Free GPT-4]

[PDF] springer.com

Understanding the acceleration phenomenon via high-resolution differential equations

B Shi, SS Du, MI Jordan, WJ Su - Mathematical Programming, 2022 - Springer

Gradient-based optimization algorithms can be studied from the perspective of limiting
ordinary differential equations (ODEs). Motivated by the fact that existing ODEs do not …

Enregistrer Citer Cité 295 fois Autres articles Les 14 versions Free GPT-4

[Free GPT-4]

[HTML] distill.pub

[HTML][HTML] Why momentum really works

G Goh - Distill, 2017 - distill.pub

Why Momentum Really Works Distill About Prize Submit Why Momentum Really Works Step-size
α = 0.02 Momentum β = 0.99 We often think of Momentum as a means of dampening …

Enregistrer Citer Cité 209 fois Autres articles Les 3 versions Free GPT-4 En cache

[Free GPT-4]

[PDF] arxiv.org

Noise is not the main factor behind the gap between sgd and adam on transformers, but sign descent might be

F Kunstner, J Chen, JW Lavington… - arxiv preprint arxiv …, 2023 - arxiv.org

The success of the Adam optimizer on a wide array of architectures has made it the default
in settings where stochastic gradient descent (SGD) performs poorly. However, our …

Enregistrer Citer Cité 60 fois Autres articles Les 3 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] projecteuclid.org

Nonparametric stochastic approximation with large step-sizes

A Dieuleveut, F Bach - 2016 - projecteuclid.org

We consider the random-design least-squares regression problem within the reproducing
kernel Hilbert space (RKHS) framework. Given a stream of independent and identically …

Enregistrer Citer Cité 219 fois Autres articles Les 16 versions Free GPT-4

[Free GPT-4]

[PDF] jmlr.org

Harder, better, faster, stronger convergence rates for least-squares regression

A Dieuleveut, N Flammarion, F Bach - Journal of Machine Learning …, 2017 - jmlr.org

We consider the optimization of a quadratic objective function whose gradients are only
accessible through a stochastic oracle that returns the gradient at any given point plus a …

Enregistrer Citer Cité 181 fois Autres articles Les 20 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] mlr.press

A variational analysis of stochastic gradient algorithms

S Mandt, M Hoffman, D Blei - International conference on …, 2016 - proceedings.mlr.press

Abstract Stochastic Gradient Descent (SGD) is an important algorithm in machine learning.
With constant learning rates, it is a stochastic process that, after an initial phase of …

Enregistrer Citer Cité 185 fois Autres articles Les 12 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] mlr.press

Dissipativity theory for Nesterov's accelerated method

B Hu, L Lessard - International Conference on Machine …, 2017 - proceedings.mlr.press

In this paper, we adapt the control theoretic concept of dissipativity theory to provide a
natural understanding of Nesterov's accelerated method. Our theory ties rigorous …

Enregistrer Citer Cité 141 fois Autres articles Les 12 versions Free GPT-4 Version HTML

Créer l'alerte

Citer

Recherche avancée

Enregistré dans Ma bibliothèque

From averaging to acceleration, there is only a step-size

Dive into deep learning

Stochastic gradient descent as approximate bayesian inference

A variational perspective on accelerated methods in optimization

Understanding the acceleration phenomenon via high-resolution differential equations

[HTML][HTML] Why momentum really works

Noise is not the main factor behind the gap between sgd and adam on transformers, but sign descent might be

Nonparametric stochastic approximation with large step-sizes

Harder, better, faster, stronger convergence rates for least-squares regression

A variational analysis of stochastic gradient algorithms

Dissipativity theory for Nesterov's accelerated method