(S) GD over Diagonal Linear Networks: Implicit bias, Large Stepsizes and Edge of Stability
In this paper, we investigate the impact of stochasticity and large stepsizes on the implicit
regularisation of gradient descent (GD) and stochastic gradient descent (SGD) over $2 …
regularisation of gradient descent (GD) and stochastic gradient descent (SGD) over $2 …
Stochastic gradient descent under Markovian sampling schemes
M Even - International Conference on Machine Learning, 2023 - proceedings.mlr.press
We study a variation of vanilla stochastic gradient descent where the optimizer only has
access to a Markovian sampling scheme. These schemes encompass applications that …
access to a Markovian sampling scheme. These schemes encompass applications that …
(S) GD over Diagonal Linear Networks: Implicit Regularisation, Large Stepsizes and Edge of Stability
In this paper, we investigate the impact of stochasticity and large stepsizes on the implicit
regularisation of gradient descent (GD) and stochastic gradient descent (SGD) over …
regularisation of gradient descent (GD) and stochastic gradient descent (SGD) over …
Local convexity of the TAP free energy and AMP convergence for -synchronization
Local convexity of the TAP free energy and AMP convergence for Z2-synchronization Page 1
The Annals of Statistics 2023, Vol. 51, No. 2, 519–546 https://doi.org/10.1214/23-AOS2257 © …
The Annals of Statistics 2023, Vol. 51, No. 2, 519–546 https://doi.org/10.1214/23-AOS2257 © …
Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize
We investigate the convergence of stochastic mirror descent (SMD) under interpolation in
relatively smooth and smooth convex optimization. In relatively smooth convex optimization …
relatively smooth and smooth convex optimization. In relatively smooth convex optimization …
Stochastic distributed optimization under average second-order similarity: Algorithms and analysis
We study finite-sum distributed optimization problems involving a master node and $ n-1$
local nodes under the popular $\delta $-similarity and $\mu $-strong convexity conditions …
local nodes under the popular $\delta $-similarity and $\mu $-strong convexity conditions …
On sample optimality in personalized collaborative and federated learning
In personalized federated learning, each member of a potentially large set of agents aims to
train a model minimizing its loss function averaged over its local data distribution. We study …
train a model minimizing its loss function averaged over its local data distribution. We study …
Nonconvex stochastic bregman proximal gradient method with application to deep learning
The widely used stochastic gradient methods for minimizing nonconvex composite objective
functions require the Lipschitz smoothness of the differentiable part. But the requirement …
functions require the Lipschitz smoothness of the differentiable part. But the requirement …
Minimizing Convex Functionals over Space of Probability Measures via KL Divergence Gradient Flow
Motivated by the computation of the non-parametric maximum likelihood estimator (NPMLE)
and the Bayesian posterior in statistics, this paper explores the problem of convex …
and the Bayesian posterior in statistics, this paper explores the problem of convex …
Two losses are better than one: Faster optimization using a cheaper proxy
We present an algorithm for minimizing an objective with hard-to-compute gradients by
using a related, easier-to-access function as a proxy. Our algorithm is based on approximate …
using a related, easier-to-access function as a proxy. Our algorithm is based on approximate …