On efficient training of large-scale deep learning models: A literature review

L Shen, Y Sun, Z Yu, L Ding, X Tian, D Tao - ar**, M Goldblum, PE Pope, M Moeller… - arxiv preprint arxiv …, 2021 - arxiv.org
It is widely believed that the implicit regularization of SGD is fundamental to the impressive
generalization behavior we observe in neural networks. In this work, we demonstrate that …

Implicit self-regularization in deep neural networks: Evidence from random matrix theory and implications for learning

CH Martin, MW Mahoney - Journal of Machine Learning Research, 2021 - jmlr.org
Random Matrix Theory (RMT) is applied to analyze the weight matrices of Deep Neural
Networks (DNNs), including both production quality, pre-trained models such as AlexNet …

Which algorithmic choices matter at which batch sizes? insights from a noisy quadratic model

G Zhang, L Li, Z Nado, J Martens… - Advances in neural …, 2019 - proceedings.neurips.cc
Increasing the batch size is a popular way to speed up neural network training, but beyond
some critical batch size, larger batch sizes yield diminishing returns. In this work, we study …