A high-bias, low-variance introduction to machine learning for physicists
Abstract Machine Learning (ML) is one of the most exciting and dynamic areas of modern
research and application. The purpose of this review is to provide an introduction to the core …
research and application. The purpose of this review is to provide an introduction to the core …
Nonconvex optimization meets low-rank matrix factorization: An overview
Substantial progress has been made recently on develo** provably accurate and efficient
algorithms for low-rank matrix factorization via nonconvex optimization. While conventional …
algorithms for low-rank matrix factorization via nonconvex optimization. While conventional …
Gradient descent finds global minima of deep neural networks
Gradient descent finds a global minimum in training deep neural networks despite the
objective function being non-convex. The current paper proves gradient descent achieves …
objective function being non-convex. The current paper proves gradient descent achieves …
Gradient descent provably optimizes over-parameterized neural networks
One of the mysteries in the success of neural networks is randomly initialized first order
methods like gradient descent can achieve zero training loss even though the objective …
methods like gradient descent can achieve zero training loss even though the objective …
Dying relu and initialization: Theory and numerical examples
The dying ReLU refers to the problem when ReLU neurons become inactive and only output
0 for any input. There are many empirical and heuristic explanations of why ReLU neurons …
0 for any input. There are many empirical and heuristic explanations of why ReLU neurons …
First-order methods almost always avoid strict saddle points
We establish that first-order methods avoid strict saddle points for almost all initializations.
Our results apply to a wide variety of first-order methods, including (manifold) gradient …
Our results apply to a wide variety of first-order methods, including (manifold) gradient …
Understanding the acceleration phenomenon via high-resolution differential equations
Gradient-based optimization algorithms can be studied from the perspective of limiting
ordinary differential equations (ODEs). Motivated by the fact that existing ODEs do not …
ordinary differential equations (ODEs). Motivated by the fact that existing ODEs do not …
Fixing by mixing: A recipe for optimal byzantine ml under heterogeneity
Byzantine machine learning (ML) aims to ensure the resilience of distributed learning
algorithms to misbehaving (or Byzantine) machines. Although this problem received …
algorithms to misbehaving (or Byzantine) machines. Although this problem received …
Neural collapse with normalized features: A geometric analysis over the riemannian manifold
When training overparameterized deep networks for classification tasks, it has been widely
observed that the learned features exhibit a so-called" neural collapse'" phenomenon. More …
observed that the learned features exhibit a so-called" neural collapse'" phenomenon. More …
On the power of over-parametrization in neural networks with quadratic activation
We provide new theoretical insights on why over-parametrization is effective in learning
neural networks. For a $ k $ hidden node shallow network with quadratic activation and $ n …
neural networks. For a $ k $ hidden node shallow network with quadratic activation and $ n …