الباحث العلمي من Google

T Lin, SU Stich, KK Patel, M Jaggi - ar** from sharp minima and regularization effects‏

Z Zhu, J Wu, B Yu, L Wu, J Ma - ar**, M Goldblum, PE Pope, M Moeller… - arxiv preprint arxiv …, 2021‏ - arxiv.org‏

It is widely believed that the implicit regularization of SGD is fundamental to the impressive
generalization behavior we observe in neural networks. In this work, we demonstrate that …‏

حفظ اقتباس تم اقتباسها في عدد: 79 مقالات ذات صلة الإصدارات الـ 3كلها إصدار HTML‏

[Free GPT-4]

[PDF] arxiv.org

A diffusion theory for deep learning dynamics: Stochastic gradient descent exponentially favors flat minima‏

Z **e, I Sato, M Sugiyama - arxiv preprint arxiv:2002.03495, 2020‏ - arxiv.org‏

Stochastic Gradient Descent (SGD) and its variants are mainstream methods for training
deep networks in practice. SGD is known to find a flat minimum that often generalizes well …‏

حفظ اقتباس تم اقتباسها في عدد: 159 مقالات ذات صلة الإصدارات الـ 3كلها إصدار HTML‏

[Free GPT-4]

[PDF] jmlr.org

Stochastic modified equations and dynamics of stochastic gradient algorithms i: Mathematical foundations‏

Q Li, C Tai, E Weinan - Journal of Machine Learning Research, 2019‏ - jmlr.org‏

We develop the mathematical foundations of the stochastic modified equations (SME)
framework for analyzing the dynamics of stochastic gradient algorithms, where the latter is …‏

حفظ اقتباس تم اقتباسها في عدد: 191 مقالات ذات صلة الإصدارات الـ 7كلها إصدار HTML‏

[Free GPT-4]

[PDF] researchgate.net

[PDF][PDF] Neural networks as interacting particle systems: Asymptotic convexity of the loss landscape and universal scaling of the approximation error‏

GM Rotskoff, E Vanden-Eijnden - stat, 2018‏ - researchgate.net‏

Neural networks, a central tool in machine learning, have demonstrated remarkable, high
fidelity performance on image recognition and classification tasks. These successes evince …‏

حفظ اقتباس تم اقتباسها في عدد: 192 مقالات ذات صلة إصدار HTML‏

إنشاء تنبيه

اقتباس

بحث متقدم

تم حفظ المقالة في مكتبتي.

On the diffusion approximation of nonconvex stochastic gradient descent

Don't use large mini-batches, use local sgd‏

A diffusion theory for deep learning dynamics: Stochastic gradient descent exponentially favors flat minima‏

Stochastic modified equations and dynamics of stochastic gradient algorithms i: Mathematical foundations‏

[PDF][PDF] Neural networks as interacting particle systems: Asymptotic convexity of the loss landscape and universal scaling of the approximation error‏