On efficient training of large-scale deep learning models: A literature review
L Shen, Y Sun, Z Yu, L Ding, X Tian, D Tao - ar**, M Goldblum, PE Pope, M Moeller… - arxiv preprint arxiv …, 2021 - arxiv.org
It is widely believed that the implicit regularization of SGD is fundamental to the impressive
generalization behavior we observe in neural networks. In this work, we demonstrate that …
generalization behavior we observe in neural networks. In this work, we demonstrate that …
Implicit self-regularization in deep neural networks: Evidence from random matrix theory and implications for learning
CH Martin, MW Mahoney - Journal of Machine Learning Research, 2021 - jmlr.org
Random Matrix Theory (RMT) is applied to analyze the weight matrices of Deep Neural
Networks (DNNs), including both production quality, pre-trained models such as AlexNet …
Networks (DNNs), including both production quality, pre-trained models such as AlexNet …
Which algorithmic choices matter at which batch sizes? insights from a noisy quadratic model
Increasing the batch size is a popular way to speed up neural network training, but beyond
some critical batch size, larger batch sizes yield diminishing returns. In this work, we study …
some critical batch size, larger batch sizes yield diminishing returns. In this work, we study …