(S) GD over Diagonal Linear Networks: Implicit bias, Large Stepsizes and Edge of Stability

M Even, S Pesme, S Gunasekar… - Advances in Neural …, 2023 - proceedings.neurips.cc
In this paper, we investigate the impact of stochasticity and large stepsizes on the implicit
regularisation of gradient descent (GD) and stochastic gradient descent (SGD) over $2 …

Stochastic gradient descent under Markovian sampling schemes

M Even - International Conference on Machine Learning, 2023 - proceedings.mlr.press
We study a variation of vanilla stochastic gradient descent where the optimizer only has
access to a Markovian sampling scheme. These schemes encompass applications that …

(S) GD over Diagonal Linear Networks: Implicit Regularisation, Large Stepsizes and Edge of Stability

M Even, S Pesme, S Gunasekar… - arxiv preprint arxiv …, 2023 - arxiv.org
In this paper, we investigate the impact of stochasticity and large stepsizes on the implicit
regularisation of gradient descent (GD) and stochastic gradient descent (SGD) over …

Local convexity of the TAP free energy and AMP convergence for -synchronization

M Celentano, Z Fan, S Mei - The Annals of Statistics, 2023 - projecteuclid.org
Local convexity of the TAP free energy and AMP convergence for Z2-synchronization Page 1
The Annals of Statistics 2023, Vol. 51, No. 2, 519–546 https://doi.org/10.1214/23-AOS2257 © …

Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize

R D'Orazio, N Loizou, I Laradji, I Mitliagkas - arxiv preprint arxiv …, 2021 - arxiv.org
We investigate the convergence of stochastic mirror descent (SMD) under interpolation in
relatively smooth and smooth convex optimization. In relatively smooth convex optimization …

Stochastic distributed optimization under average second-order similarity: Algorithms and analysis

D Lin, Y Han, H Ye, Z Zhang - Advances in Neural …, 2024 - proceedings.neurips.cc
We study finite-sum distributed optimization problems involving a master node and $ n-1$
local nodes under the popular $\delta $-similarity and $\mu $-strong convexity conditions …

On sample optimality in personalized collaborative and federated learning

M Even, L Massoulié, K Scaman - Advances in Neural …, 2022 - proceedings.neurips.cc
In personalized federated learning, each member of a potentially large set of agents aims to
train a model minimizing its loss function averaged over its local data distribution. We study …

Nonconvex stochastic bregman proximal gradient method with application to deep learning

K Ding, J Li, KC Toh - arxiv preprint arxiv:2306.14522, 2023 - arxiv.org
The widely used stochastic gradient methods for minimizing nonconvex composite objective
functions require the Lipschitz smoothness of the differentiable part. But the requirement …

Minimizing Convex Functionals over Space of Probability Measures via KL Divergence Gradient Flow

R Yao, L Huang, Y Yang - International Conference on …, 2024 - proceedings.mlr.press
Motivated by the computation of the non-parametric maximum likelihood estimator (NPMLE)
and the Bayesian posterior in statistics, this paper explores the problem of convex …

Two losses are better than one: Faster optimization using a cheaper proxy

B Woodworth, K Mishchenko… - … Conference on Machine …, 2023 - proceedings.mlr.press
We present an algorithm for minimizing an objective with hard-to-compute gradients by
using a related, easier-to-access function as a proxy. Our algorithm is based on approximate …