When does preconditioning help or hurt generalization?

S Amari, J Ba, R Grosse, X Li, A Nitanda… - arxiv preprint arxiv …, 2020 - arxiv.org
While second order optimizers such as natural gradient descent (NGD) often speed up
optimization, their effect on generalization has been called into question. This work presents …

Importance tempering: Group robustness for overparameterized models

Y Lu, W Ji, Z Izzo, L Ying - arxiv preprint arxiv:2209.08745, 2022 - arxiv.org
Although overparameterized models have shown their success on many machine learning
tasks, the accuracy could drop on the testing distribution that is different from the training …

Which Minimizer Does My Neural Network Converge To?

M Nonnenmacher, D Reeb, I Steinwart - … 13–17, 2021, Proceedings, Part III …, 2021 - Springer
The loss surface of an overparameterized neural network (NN) possesses many global
minima of zero training error. We explain how common variants of the standard NN training …

[KİTAP][B] Optimization: Stochastic thermodynamics, machine learning, and numerical algorithms

NS Wadia - 2022 - search.proquest.com
OPTIMIZATION: STOCHASTIC THERMODYNAMICS, MACHINE LEARNING, AND
NUMERICAL ALGORITHMS by Neha Spenta Wadia A dissertation submitte Page 1 …

Training Efficiency and Robustness in Deep Learning

F Faghri - 2022 - search.proquest.com
Deep Learning has revolutionized machine learning and artificial intelligence, achieving
superhuman performance in several standard benchmarks. It is well-known that deep …