Variance reduced proxskip: Algorithm, theory and application to federated learning

G Malinovsky, K Yi, P Richtárik - Advances in Neural …, 2022 - proceedings.neurips.cc
We study distributed optimization methods based on the {\em local training (LT)} paradigm,
ie, methods which achieve communication efficiency by performing richer local gradient …

SGD: General analysis and improved rates

RM Gower, N Loizou, X Qian… - International …, 2019 - proceedings.mlr.press
We propose a general yet simple theorem describing the convergence of SGD under the
arbitrary sampling paradigm. Our theorem describes the convergence of an infinite array of …

A comprehensive survey on training acceleration for large machine learning models in IoT

H Wang, Z Qu, Q Zhou, H Zhang, B Luo… - IEEE Internet of …, 2021 - ieeexplore.ieee.org
The ever-growing artificial intelligence (AI) applications have greatly reshaped our world in
many areas, eg, smart home, computer vision, natural language processing, etc. Behind …