Hyper-parameter optimization: A review of algorithms and applications

T Yu, H Zhu - arxiv preprint arxiv:2003.05689, 2020 - arxiv.org
Since deep neural networks were developed, they have made huge contributions to
everyday lives. Machine learning provides more rational advice than humans are capable of …

SGD: General analysis and improved rates

RM Gower, N Loizou, X Qian… - International …, 2019 - proceedings.mlr.press
We propose a general yet simple theorem describing the convergence of SGD under the
arbitrary sampling paradigm. Our theorem describes the convergence of an infinite array of …

An improved analysis of stochastic gradient descent with momentum

Y Liu, Y Gao, W Yin - Advances in Neural Information …, 2020 - proceedings.neurips.cc
SGD with momentum (SGDM) has been widely applied in many machine learning tasks, and
it is often applied with dynamic stepsizes and momentum weights tuned in a stagewise …

New insights and perspectives on the natural gradient method

J Martens - Journal of Machine Learning Research, 2020 - jmlr.org
Natural gradient descent is an optimization method traditionally motivated from the
perspective of information geometry, and works well for many applications as an alternative …

Stochastic polyak step-size for sgd: An adaptive learning rate for fast convergence

N Loizou, S Vaswani, IH Laradji… - International …, 2021 - proceedings.mlr.press
We propose a stochastic variant of the classical Polyak step-size (Polyak, 1987) commonly
used in the subgradient method. Although computing the Polyak step-size requires …

Exponential graph is provably efficient for decentralized deep training

B Ying, K Yuan, Y Chen, H Hu… - Advances in Neural …, 2021 - proceedings.neurips.cc
Decentralized SGD is an emerging training method for deep learning known for its much
less (thus faster) communication per iteration, which relaxes the averaging step in parallel …

Painless stochastic gradient: Interpolation, line-search, and convergence rates

S Vaswani, A Mishkin, I Laradji… - Advances in neural …, 2019 - proceedings.neurips.cc
Recent works have shown that stochastic gradient descent (SGD) achieves the fast
convergence rates of full-batch gradient descent for over-parameterized models satisfying …

Regularization via mass transportation

S Shafieezadeh-Abadeh, D Kuhn… - Journal of Machine …, 2019 - jmlr.org
The goal of regression and classification methods in supervised learning is to minimize the
empirical risk, that is, the expectation of some loss function quantifying the prediction error …

A wavelet-based deep learning pipeline for efficient COVID-19 diagnosis via CT slices

O Attallah, A Samir - Applied Soft Computing, 2022 - Elsevier
The quick diagnosis of the novel coronavirus (COVID-19) disease is vital to prevent its
propagation and improve therapeutic outcomes. Computed tomography (CT) is believed to …

Quasi-hyperbolic momentum and adam for deep learning

J Ma, D Yarats - arxiv preprint arxiv:1810.06801, 2018 - arxiv.org
Momentum-based acceleration of stochastic gradient descent (SGD) is widely used in deep
learning. We propose the quasi-hyperbolic momentum algorithm (QHM) as an extremely …