Nonconvex optimization meets low-rank matrix factorization: An overview

Y Chi, YM Lu, Y Chen - IEEE Transactions on Signal …, 2019 - ieeexplore.ieee.org
Substantial progress has been made recently on develo** provably accurate and efficient
algorithms for low-rank matrix factorization via nonconvex optimization. While conventional …

On the implicit bias in deep-learning algorithms

G Vardi - Communications of the ACM, 2023 - dl.acm.org
On the Implicit Bias in Deep-Learning Algorithms Page 1 DEEP LEARNING HAS been highly
successful in recent years and has led to dramatic improvements in multiple domains …

On the opportunities and risks of foundation models

R Bommasani, DA Hudson, E Adeli, R Altman… - arxiv preprint arxiv …, 2021 - arxiv.org
AI is undergoing a paradigm shift with the rise of models (eg, BERT, DALL-E, GPT-3) that are
trained on broad data at scale and are adaptable to a wide range of downstream tasks. We …

Fine-tuning can distort pretrained features and underperform out-of-distribution

A Kumar, A Raghunathan, R Jones, T Ma… - arxiv preprint arxiv …, 2022 - arxiv.org
When transferring a pretrained model to a downstream task, two popular methods are full
fine-tuning (updating all the model parameters) and linear probing (updating only the last …

Reconciling modern machine-learning practice and the classical bias–variance trade-off

M Belkin, D Hsu, S Ma… - Proceedings of the …, 2019 - National Acad Sciences
Breakthroughs in machine learning are rapidly changing science and society, yet our
fundamental understanding of this technology has lagged far behind. Indeed, one of the …

Deep learning: a statistical viewpoint

PL Bartlett, A Montanari, A Rakhlin - Acta numerica, 2021 - cambridge.org
The remarkable practical success of deep learning has revealed some major surprises from
a theoretical perspective. In particular, simple gradient methods easily find near-optimal …

The implicit bias of gradient descent on separable data

D Soudry, E Hoffer, MS Nacson, S Gunasekar… - Journal of Machine …, 2018 - jmlr.org
We examine gradient descent on unregularized logistic regression problems, with
homogeneous linear predictors on linearly separable datasets. We show the predictor …

On the global convergence of gradient descent for over-parameterized models using optimal transport

L Chizat, F Bach - Advances in neural information …, 2018 - proceedings.neurips.cc
Many tasks in machine learning and signal processing can be solved by minimizing a
convex function of a measure. This includes sparse spikes deconvolution or training a …

Learning overparameterized neural networks via stochastic gradient descent on structured data

Y Li, Y Liang - Advances in neural information processing …, 2018 - proceedings.neurips.cc
Neural networks have many successful applications, while much less theoretical
understanding has been gained. Towards bridging this gap, we study the problem of …

Gradient starvation: A learning proclivity in neural networks

M Pezeshki, O Kaba, Y Bengio… - Advances in …, 2021 - proceedings.neurips.cc
We identify and formalize a fundamental gradient descent phenomenon resulting in a
learning proclivity in over-parameterized neural networks. Gradient Starvation arises when …