Recent advances in stochastic gradient descent in deep learning

Y Tian, Y Zhang, H Zhang - Mathematics, 2023 - mdpi.com
In the age of artificial intelligence, the best approach to handling huge amounts of data is a
tremendously motivating and hard problem. Among machine learning models, stochastic …

Distributed machine learning for wireless communication networks: Techniques, architectures, and applications

S Hu, X Chen, W Ni, E Hossain… - … Surveys & Tutorials, 2021 - ieeexplore.ieee.org
Distributed machine learning (DML) techniques, such as federated learning, partitioned
learning, and distributed reinforcement learning, have been increasingly applied to wireless …

Federated learning: A survey on enabling technologies, protocols, and applications

M Aledhari, R Razzak, RM Parizi, F Saeed - IEEE Access, 2020 - ieeexplore.ieee.org
This paper provides a comprehensive study of Federated Learning (FL) with an emphasis
on enabling software and hardware platforms, protocols, real-life applications and use …

Federated optimization: Distributed machine learning for on-device intelligence

J Konečný, HB McMahan, D Ramage… - arxiv preprint arxiv …, 2016 - arxiv.org
We introduce a new and increasingly relevant setting for distributed optimization in machine
learning, where the data defining the optimization are unevenly distributed over an …

SARAH: A novel method for machine learning problems using stochastic recursive gradient

LM Nguyen, J Liu, K Scheinberg… - … conference on machine …, 2017 - proceedings.mlr.press
In this paper, we propose a StochAstic Recursive grAdient algoritHm (SARAH), as well as its
practical variant SARAH+, as a novel approach to the finite-sum minimization problems …

: Decentralized Training over Decentralized Data

H Tang, X Lian, M Yan, C Zhang… - … Conference on Machine …, 2018 - proceedings.mlr.press
While training a machine learning model using multiple workers, each of which collects data
from its own data source, it would be useful when the data collected from different workers …

Katyusha: The first direct acceleration of stochastic gradient methods

Z Allen-Zhu - Journal of Machine Learning Research, 2018 - jmlr.org
Nesterov's momentum trick is famously known for accelerating gradient descent, and has
been proven useful in building fast iterative algorithms. However, in the stochastic setting …

Stochastic variance reduction for nonconvex optimization

SJ Reddi, A Hefny, S Sra, B Poczos… - … on machine learning, 2016 - proceedings.mlr.press
We study nonconvex finite-sum problems and analyze stochastic variance reduced gradient
(SVRG) methods for them. SVRG and related methods have recently surged into …

Variance-reduced methods for machine learning

RM Gower, M Schmidt, F Bach… - Proceedings of the …, 2020 - ieeexplore.ieee.org
Stochastic optimization lies at the heart of machine learning, and its cornerstone is
stochastic gradient descent (SGD), a method introduced over 60 years ago. The last eight …

Momentum and stochastic momentum for stochastic gradient, newton, proximal point and subspace descent methods

N Loizou, P Richtárik - Computational Optimization and Applications, 2020 - Springer
In this paper we study several classes of stochastic optimization algorithms enriched with
heavy ball momentum. Among the methods studied are: stochastic gradient descent …