Random reshuffling: Simple analysis with vast improvements

K Mishchenko, A Khaled… - Advances in Neural …, 2020 - proceedings.neurips.cc
Random Reshuffling (RR) is an algorithm for minimizing finite-sum functions that utilizes
iterative gradient descent steps in conjunction with data reshuffling. Often contrasted with its …

Improving the sample and communication complexity for decentralized non-convex optimization: Joint gradient estimation and tracking

H Sun, S Lu, M Hong - International conference on machine …, 2020 - proceedings.mlr.press
Many modern large-scale machine learning problems benefit from decentralized and
stochastic optimization. Recent works have shown that utilizing both decentralized …

How good is SGD with random shuffling?

I Safran, O Shamir - Conference on Learning Theory, 2020 - proceedings.mlr.press
We study the performance of stochastic gradient descent (SGD) on smooth and strongly-
convex finite-sum optimization problems. In contrast to the majority of existing theoretical …

Closing the convergence gap of SGD without replacement

S Rajput, A Gupta… - … Conference on Machine …, 2020 - proceedings.mlr.press
Stochastic gradient descent without replacement sampling is widely used in practice for
model training. However, the vast majority of SGD analyses assumes data is sampled with …

Effective model sparsification by scheduled grow-and-prune methods

X Ma, M Qin, F Sun, Z Hou, K Yuan, Y Xu… - arxiv preprint arxiv …, 2021 - arxiv.org
Deep neural networks (DNNs) are effective in solving many real-world problems. Larger
DNN models usually exhibit better quality (eg, accuracy) but their excessive computation …

Random shuffling beats sgd only after many epochs on ill-conditioned problems

I Safran, O Shamir - Advances in Neural Information …, 2021 - proceedings.neurips.cc
Recently, there has been much interest in studying the convergence rates of without-
replacement SGD, and proving that it is faster than with-replacement SGD in the worst case …

In-database machine learning with corgipile: Stochastic gradient descent without full data shuffle

L Xu, S Qiu, B Yuan, J Jiang, C Renggli, S Gan… - Proceedings of the …, 2022 - dl.acm.org
Stochastic gradient descent (SGD) is the cornerstone of modern ML systems. Despite its
computational efficiency, SGD requires random data access that is inherently inefficient …

Distributed random reshuffling over networks

K Huang, X Li, A Milzarek, S Pu… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
In this paper, we consider distributed optimization problems where agents, each possessing
a local cost function, collaboratively minimize the average of the local cost functions over a …

Random reshuffling with variance reduction: New analysis and better rates

G Malinovsky, A Sailanbayev… - Uncertainty in Artificial …, 2023 - proceedings.mlr.press
Virtually all state-of-the-art methods for training supervised machine learning models are
variants of Stochastic Gradient Descent (SGD), enhanced with a number of additional tricks …

Variance-reduced stochastic learning under random reshuffling

B Ying, K Yuan, AH Sayed - IEEE Transactions on Signal …, 2020 - ieeexplore.ieee.org
Several useful variance-reduced stochastic gradient algorithms, such as SVRG, SAGA,
Finito, and SAG, have been proposed to minimize empirical risks with linear convergence …