Random reshuffling: Simple analysis with vast improvements
Random Reshuffling (RR) is an algorithm for minimizing finite-sum functions that utilizes
iterative gradient descent steps in conjunction with data reshuffling. Often contrasted with its …
iterative gradient descent steps in conjunction with data reshuffling. Often contrasted with its …
Improving the sample and communication complexity for decentralized non-convex optimization: Joint gradient estimation and tracking
Many modern large-scale machine learning problems benefit from decentralized and
stochastic optimization. Recent works have shown that utilizing both decentralized …
stochastic optimization. Recent works have shown that utilizing both decentralized …
How good is SGD with random shuffling?
We study the performance of stochastic gradient descent (SGD) on smooth and strongly-
convex finite-sum optimization problems. In contrast to the majority of existing theoretical …
convex finite-sum optimization problems. In contrast to the majority of existing theoretical …
Closing the convergence gap of SGD without replacement
Stochastic gradient descent without replacement sampling is widely used in practice for
model training. However, the vast majority of SGD analyses assumes data is sampled with …
model training. However, the vast majority of SGD analyses assumes data is sampled with …
Effective model sparsification by scheduled grow-and-prune methods
Deep neural networks (DNNs) are effective in solving many real-world problems. Larger
DNN models usually exhibit better quality (eg, accuracy) but their excessive computation …
DNN models usually exhibit better quality (eg, accuracy) but their excessive computation …
Random shuffling beats sgd only after many epochs on ill-conditioned problems
Recently, there has been much interest in studying the convergence rates of without-
replacement SGD, and proving that it is faster than with-replacement SGD in the worst case …
replacement SGD, and proving that it is faster than with-replacement SGD in the worst case …
In-database machine learning with corgipile: Stochastic gradient descent without full data shuffle
Stochastic gradient descent (SGD) is the cornerstone of modern ML systems. Despite its
computational efficiency, SGD requires random data access that is inherently inefficient …
computational efficiency, SGD requires random data access that is inherently inefficient …
Distributed random reshuffling over networks
In this paper, we consider distributed optimization problems where agents, each possessing
a local cost function, collaboratively minimize the average of the local cost functions over a …
a local cost function, collaboratively minimize the average of the local cost functions over a …
Random reshuffling with variance reduction: New analysis and better rates
Virtually all state-of-the-art methods for training supervised machine learning models are
variants of Stochastic Gradient Descent (SGD), enhanced with a number of additional tricks …
variants of Stochastic Gradient Descent (SGD), enhanced with a number of additional tricks …
Variance-reduced stochastic learning under random reshuffling
Several useful variance-reduced stochastic gradient algorithms, such as SVRG, SAGA,
Finito, and SAG, have been proposed to minimize empirical risks with linear convergence …
Finito, and SAG, have been proposed to minimize empirical risks with linear convergence …