A high-bias, low-variance introduction to machine learning for physicists

P Mehta, M Bukov, CH Wang, AGR Day, C Richardson… - Physics reports, 2019 - Elsevier
Abstract Machine Learning (ML) is one of the most exciting and dynamic areas of modern
research and application. The purpose of this review is to provide an introduction to the core …

Nonconvex optimization meets low-rank matrix factorization: An overview

Y Chi, YM Lu, Y Chen - IEEE Transactions on Signal …, 2019 - ieeexplore.ieee.org
Substantial progress has been made recently on develo** provably accurate and efficient
algorithms for low-rank matrix factorization via nonconvex optimization. While conventional …

Adan: Adaptive nesterov momentum algorithm for faster optimizing deep models

X **e, P Zhou, H Li, Z Lin, S Yan - IEEE Transactions on Pattern …, 2024 - ieeexplore.ieee.org
In deep learning, different kinds of deep networks typically need different optimizers, which
have to be chosen after multiple trials, making the training process inefficient. To relieve this …

signSGD: Compressed optimisation for non-convex problems

J Bernstein, YX Wang… - International …, 2018 - proceedings.mlr.press
Training large neural networks requires distributing learning across multiple workers, where
the cost of communicating gradients can be a significant bottleneck. signSGD alleviates this …

Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator

C Fang, CJ Li, Z Lin, T Zhang - Advances in neural …, 2018 - proceedings.neurips.cc
In this paper, we propose a new technique named\textit {Stochastic Path-Integrated
Differential EstimatoR}(SPIDER), which can be used to track many deterministic quantities of …

On the convergence of a class of adam-type algorithms for non-convex optimization

X Chen, S Liu, R Sun, M Hong - arxiv preprint arxiv:1808.02941, 2018 - arxiv.org
This paper studies a class of adaptive gradient based momentum algorithms that update the
search directions and learning rates simultaneously using past gradients. This class, which …

ReduNet: A white-box deep network from the principle of maximizing rate reduction

KHR Chan, Y Yu, C You, H Qi, J Wright, Y Ma - Journal of machine learning …, 2022 - jmlr.org
This work attempts to provide a plausible theoretical framework that aims to interpret modern
deep (convolutional) networks from the principles of data compression and discriminative …

Global convergence of policy gradient methods to (almost) locally optimal policies

K Zhang, A Koppel, H Zhu, T Basar - SIAM Journal on Control and …, 2020 - SIAM
Policy gradient (PG) methods have been one of the most essential ingredients of
reinforcement learning, with application in a variety of domains. In spite of the empirical …

Gradient descent with random initialization: Fast global convergence for nonconvex phase retrieval

Y Chen, Y Chi, J Fan, C Ma - Mathematical Programming, 2019 - Springer
This paper considers the problem of solving systems of quadratic equations, namely,
recovering an object of interest x^ ♮ ∈ R^ nx♮∈ R n from m quadratic equations/samples …

Stochastic nested variance reduction for nonconvex optimization

D Zhou, P Xu, Q Gu - Journal of machine learning research, 2020 - jmlr.org
We study nonconvex optimization problems, where the objective function is either an
average of n nonconvex functions or the expectation of some stochastic function. We …