Hyper-parameter optimization: A review of algorithms and applications
T Yu, H Zhu - arxiv preprint arxiv:2003.05689, 2020 - arxiv.org
Since deep neural networks were developed, they have made huge contributions to
everyday lives. Machine learning provides more rational advice than humans are capable of …
everyday lives. Machine learning provides more rational advice than humans are capable of …
SGD: General analysis and improved rates
We propose a general yet simple theorem describing the convergence of SGD under the
arbitrary sampling paradigm. Our theorem describes the convergence of an infinite array of …
arbitrary sampling paradigm. Our theorem describes the convergence of an infinite array of …
An improved analysis of stochastic gradient descent with momentum
SGD with momentum (SGDM) has been widely applied in many machine learning tasks, and
it is often applied with dynamic stepsizes and momentum weights tuned in a stagewise …
it is often applied with dynamic stepsizes and momentum weights tuned in a stagewise …
New insights and perspectives on the natural gradient method
J Martens - Journal of Machine Learning Research, 2020 - jmlr.org
Natural gradient descent is an optimization method traditionally motivated from the
perspective of information geometry, and works well for many applications as an alternative …
perspective of information geometry, and works well for many applications as an alternative …
Stochastic polyak step-size for sgd: An adaptive learning rate for fast convergence
We propose a stochastic variant of the classical Polyak step-size (Polyak, 1987) commonly
used in the subgradient method. Although computing the Polyak step-size requires …
used in the subgradient method. Although computing the Polyak step-size requires …
Exponential graph is provably efficient for decentralized deep training
Decentralized SGD is an emerging training method for deep learning known for its much
less (thus faster) communication per iteration, which relaxes the averaging step in parallel …
less (thus faster) communication per iteration, which relaxes the averaging step in parallel …
Painless stochastic gradient: Interpolation, line-search, and convergence rates
Recent works have shown that stochastic gradient descent (SGD) achieves the fast
convergence rates of full-batch gradient descent for over-parameterized models satisfying …
convergence rates of full-batch gradient descent for over-parameterized models satisfying …
Regularization via mass transportation
The goal of regression and classification methods in supervised learning is to minimize the
empirical risk, that is, the expectation of some loss function quantifying the prediction error …
empirical risk, that is, the expectation of some loss function quantifying the prediction error …
A wavelet-based deep learning pipeline for efficient COVID-19 diagnosis via CT slices
The quick diagnosis of the novel coronavirus (COVID-19) disease is vital to prevent its
propagation and improve therapeutic outcomes. Computed tomography (CT) is believed to …
propagation and improve therapeutic outcomes. Computed tomography (CT) is believed to …
Quasi-hyperbolic momentum and adam for deep learning
Momentum-based acceleration of stochastic gradient descent (SGD) is widely used in deep
learning. We propose the quasi-hyperbolic momentum algorithm (QHM) as an extremely …
learning. We propose the quasi-hyperbolic momentum algorithm (QHM) as an extremely …