Nonconvex optimization meets low-rank matrix factorization: An overview
Substantial progress has been made recently on develo** provably accurate and efficient
algorithms for low-rank matrix factorization via nonconvex optimization. While conventional …
algorithms for low-rank matrix factorization via nonconvex optimization. While conventional …
On the implicit bias in deep-learning algorithms
G Vardi - Communications of the ACM, 2023 - dl.acm.org
On the Implicit Bias in Deep-Learning Algorithms Page 1 DEEP LEARNING HAS been highly
successful in recent years and has led to dramatic improvements in multiple domains …
successful in recent years and has led to dramatic improvements in multiple domains …
On the opportunities and risks of foundation models
AI is undergoing a paradigm shift with the rise of models (eg, BERT, DALL-E, GPT-3) that are
trained on broad data at scale and are adaptable to a wide range of downstream tasks. We …
trained on broad data at scale and are adaptable to a wide range of downstream tasks. We …
Fine-tuning can distort pretrained features and underperform out-of-distribution
When transferring a pretrained model to a downstream task, two popular methods are full
fine-tuning (updating all the model parameters) and linear probing (updating only the last …
fine-tuning (updating all the model parameters) and linear probing (updating only the last …
Reconciling modern machine-learning practice and the classical bias–variance trade-off
Breakthroughs in machine learning are rapidly changing science and society, yet our
fundamental understanding of this technology has lagged far behind. Indeed, one of the …
fundamental understanding of this technology has lagged far behind. Indeed, one of the …
Deep learning: a statistical viewpoint
The remarkable practical success of deep learning has revealed some major surprises from
a theoretical perspective. In particular, simple gradient methods easily find near-optimal …
a theoretical perspective. In particular, simple gradient methods easily find near-optimal …
The implicit bias of gradient descent on separable data
We examine gradient descent on unregularized logistic regression problems, with
homogeneous linear predictors on linearly separable datasets. We show the predictor …
homogeneous linear predictors on linearly separable datasets. We show the predictor …
On the global convergence of gradient descent for over-parameterized models using optimal transport
Many tasks in machine learning and signal processing can be solved by minimizing a
convex function of a measure. This includes sparse spikes deconvolution or training a …
convex function of a measure. This includes sparse spikes deconvolution or training a …
Learning overparameterized neural networks via stochastic gradient descent on structured data
Neural networks have many successful applications, while much less theoretical
understanding has been gained. Towards bridging this gap, we study the problem of …
understanding has been gained. Towards bridging this gap, we study the problem of …
Gradient starvation: A learning proclivity in neural networks
We identify and formalize a fundamental gradient descent phenomenon resulting in a
learning proclivity in over-parameterized neural networks. Gradient Starvation arises when …
learning proclivity in over-parameterized neural networks. Gradient Starvation arises when …