Variance reduced proxskip: Algorithm, theory and application to federated learning
We study distributed optimization methods based on the {\em local training (LT)} paradigm,
ie, methods which achieve communication efficiency by performing richer local gradient …
ie, methods which achieve communication efficiency by performing richer local gradient …
SGD: General analysis and improved rates
We propose a general yet simple theorem describing the convergence of SGD under the
arbitrary sampling paradigm. Our theorem describes the convergence of an infinite array of …
arbitrary sampling paradigm. Our theorem describes the convergence of an infinite array of …
A comprehensive survey on training acceleration for large machine learning models in IoT
The ever-growing artificial intelligence (AI) applications have greatly reshaped our world in
many areas, eg, smart home, computer vision, natural language processing, etc. Behind …
many areas, eg, smart home, computer vision, natural language processing, etc. Behind …