- Academic Search

S Arora, Z Li, A Panigrahi - International Conference on …, 2022 - proceedings.mlr.press

Deep learning experiments by\citet {cohen2021gradient} using deterministic Gradient
Descent (GD) revealed an Edge of Stability (EoS) phase when learning rate (LR) and …

Zapisz Cytuj Cytowane przez 116 Powiązane artykuły Wszystkie wersje 7 Wersja HTML

[Free GPT-4]

[PDF] neurips.cc

Towards theoretically understanding why sgd generalizes better than adam in deep learning

P Zhou, J Feng, C Ma, C **ong… - Advances in Neural …, 2020 - proceedings.neurips.cc

It is not clear yet why ADAM-alike adaptive gradient algorithms suffer from worse
generalization performance than SGD despite their faster training speed. This work aims to …

Zapisz Cytuj Cytowane przez 318 Powiązane artykuły Wszystkie wersje 8 Wersja HTML

[Free GPT-4]

[PDF] arxiv.org

Don't use large mini-batches, use local sgd

T Lin, SU Stich, KK Patel, M Jaggi - arxiv preprint arxiv:1808.07217, 2018 - arxiv.org

Mini-batch stochastic gradient methods (SGD) are state of the art for distributed training of
deep neural networks. Drastic increases in the mini-batch sizes have lead to key efficiency …

Zapisz Cytuj Cytowane przez 508 Powiązane artykuły Wszystkie wersje 9 Wersja HTML

[Free GPT-4]

[PDF] arxiv.org

On the origin of implicit regularization in stochastic gradient descent

SL Smith, B Dherin, DGT Barrett, S De - arxiv preprint arxiv:2101.12176, 2021 - arxiv.org

For infinitesimal learning rates, stochastic gradient descent (SGD) follows the path of
gradient flow on the full batch loss function. However moderately large learning rates can …

Zapisz Cytuj Cytowane przez 227 Powiązane artykuły Wszystkie wersje 5 Wersja HTML

[Free GPT-4]

[PDF] jmlr.org

Stochastic gradient descent as approximate bayesian inference

M Stephan, MD Hoffman, DM Blei - Journal of Machine Learning …, 2017 - jmlr.org

Stochastic Gradient Descent with a constant learning rate (constant SGD) simulates a
Markov chain with a stationary distribution. With this perspective, we derive several new …

Zapisz Cytuj Cytowane przez 724 Powiązane artykuły Wszystkie wersje 10 Wersja HTML

[Free GPT-4]

[PDF] arxiv.org

Three factors influencing minima in sgd

S Jastrzębski, Z Kenton, D Arpit, N Ballas… - arxiv preprint arxiv …, 2017 - arxiv.org

We investigate the dynamical and convergent properties of stochastic gradient descent
(SGD) applied to Deep Neural Networks (DNNs). Characterizing the relation between …

Zapisz Cytuj Cytowane przez 532 Powiązane artykuły Wszystkie wersje 5 Wersja HTML

[Free GPT-4]

[PDF] arxiv.org

Stochastic gradient descent performs variational inference, converges to limit cycles for deep networks

P Chaudhari, S Soatto - 2018 Information Theory and …, 2018 - ieeexplore.ieee.org

Stochastic gradient descent (SGD) is widely believed to perform implicit regularization when
used to train deep neural networks, but the precise manner in which this occurs has thus far …

Zapisz Cytuj Cytowane przez 365 Powiązane artykuły Wszystkie wersje 9

[Free GPT-4]

[PDF] arxiv.org

Implicit gradient regularization

DGT Barrett, B Dherin - arxiv preprint arxiv:2009.11162, 2020 - arxiv.org

Gradient descent can be surprisingly good at optimizing deep neural networks without
overfitting and without explicit regularization. We find that the discrete steps of gradient …

Zapisz Cytuj Cytowane przez 167 Powiązane artykuły Wszystkie wersje 4 Wersja HTML

[Free GPT-4]

[PDF] springer.com

Understanding the acceleration phenomenon via high-resolution differential equations

B Shi, SS Du, MI Jordan, WJ Su - Mathematical Programming, 2022 - Springer

Gradient-based optimization algorithms can be studied from the perspective of limiting
ordinary differential equations (ODEs). Motivated by the fact that existing ODEs do not …

Zapisz Cytuj Cytowane przez 295 Powiązane artykuły Wszystkie wersje 14

[Free GPT-4]

[PDF] arxiv.org

What Happens after SGD Reaches Zero Loss?--A Mathematical Framework

Z Li, T Wang, S Arora - arxiv preprint arxiv:2110.06914, 2021 - arxiv.org

Understanding the implicit bias of Stochastic Gradient Descent (SGD) is one of the key
challenges in deep learning, especially for overparametrized models, where the local …

Zapisz Cytuj Cytowane przez 112 Powiązane artykuły Wszystkie wersje 7 Wersja HTML

Utwórz alert

Cytuj

Szukanie zaawansowane

Zapisano w Mojej bibliotece

Stochastic modified equations and adaptive stochastic gradient algorithms

Understanding gradient descent on the edge of stability in deep learning

Towards theoretically understanding why sgd generalizes better than adam in deep learning

Don't use large mini-batches, use local sgd

On the origin of implicit regularization in stochastic gradient descent

Stochastic gradient descent as approximate bayesian inference

Three factors influencing minima in sgd

Stochastic gradient descent performs variational inference, converges to limit cycles for deep networks

Implicit gradient regularization

Understanding the acceleration phenomenon via high-resolution differential equations

What Happens after SGD Reaches Zero Loss?--A Mathematical Framework