- Academic Search

L Shen, Y Sun, Z Yu, L Ding, X Tian, D Tao - ar**, M Goldblum, PE Pope, M Moeller… - arxiv preprint arxiv …, 2021 - arxiv.org

It is widely believed that the implicit regularization of SGD is fundamental to the impressive
generalization behavior we observe in neural networks. In this work, we demonstrate that …

Save Cite Cited by 79 Related articles All 3 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] jmlr.org

Implicit self-regularization in deep neural networks: Evidence from random matrix theory and implications for learning

CH Martin, MW Mahoney - Journal of Machine Learning Research, 2021 - jmlr.org

Random Matrix Theory (RMT) is applied to analyze the weight matrices of Deep Neural
Networks (DNNs), including both production quality, pre-trained models such as AlexNet …

Save Cite Cited by 214 Related articles All 9 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] neurips.cc

Which algorithmic choices matter at which batch sizes? insights from a noisy quadratic model

G Zhang, L Li, Z Nado, J Martens… - Advances in neural …, 2019 - proceedings.neurips.cc

Increasing the batch size is a popular way to speed up neural network training, but beyond
some critical batch size, larger batch sizes yield diminishing returns. In this work, we study …

Save Cite Cited by 150 Related articles All 8 versions Free GPT-4 View as HTML

Create alert

Cite

Advanced search

Saved to My library

On the computational inefficiency of large batch sizes for stochastic gradient descent

On efficient training of large-scale deep learning models: A literature review

Implicit self-regularization in deep neural networks: Evidence from random matrix theory and implications for learning

Which algorithmic choices matter at which batch sizes? insights from a noisy quadratic model