Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
Random reshuffling: Simple analysis with vast improvements
Random Reshuffling (RR) is an algorithm for minimizing finite-sum functions that utilizes
iterative gradient descent steps in conjunction with data reshuffling. Often contrasted with its …
iterative gradient descent steps in conjunction with data reshuffling. Often contrasted with its …
On the convergence of federated averaging with cyclic client participation
Abstract Federated Averaging (FedAvg) and its variants are the most popular optimization
algorithms in federated learning (FL). Previous convergence analyses of FedAvg either …
algorithms in federated learning (FL). Previous convergence analyses of FedAvg either …
Recent theoretical advances in non-convex optimization
Motivated by recent increased interest in optimization algorithms for non-convex
optimization in application to training deep neural networks and other optimization problems …
optimization in application to training deep neural networks and other optimization problems …
On the impact of machine learning randomness on group fairness
Statistical measures for group fairness in machine learning reflect the gap in performance of
algorithms across different groups. These measures, however, exhibit a high variance …
algorithms across different groups. These measures, however, exhibit a high variance …
Sgd with shuffling: optimal rates without component convexity and large epoch requirements
We study without-replacement SGD for solving finite-sum optimization problems.
Specifically, depending on how the indices of the finite-sum are shuffled, we consider the …
Specifically, depending on how the indices of the finite-sum are shuffled, we consider the …
Asgrad: A sharp unified analysis of asynchronous-sgd algorithms
We analyze asynchronous-type algorithms for distributed SGD in the heterogeneous setting,
where each worker has its own computation and communication speeds, as well as data …
where each worker has its own computation and communication speeds, as well as data …
Convergence of random reshuffling under the kurdyka–łojasiewicz inequality
We study the random reshuffling () method for smooth nonconvex optimization problems
with a finite-sum structure. Though this method is widely utilized in practice, eg, in the …
with a finite-sum structure. Though this method is widely utilized in practice, eg, in the …
Minibatch vs local SGD with shuffling: Tight convergence bounds and beyond
In distributed learning, local SGD (also known as federated averaging) and its simple
baseline minibatch SGD are widely studied optimization methods. Most existing analyses of …
baseline minibatch SGD are widely studied optimization methods. Most existing analyses of …
Adaptive step size rules for stochastic optimization in large-scale learning
Z Yang, L Ma - Statistics and Computing, 2023 - Springer
The importance of the step size in stochastic optimization has been confirmed both
theoretically and empirically during the past few decades and reconsidered in recent years …
theoretically and empirically during the past few decades and reconsidered in recent years …
Why globally re-shuffle? Revisiting data shuffling in large scale deep learning
Stochastic gradient descent (SGD) is the most prevalent algorithm for training Deep Neural
Networks (DNN). SGD iterates the input data set in each training epoch processing data …
Networks (DNN). SGD iterates the input data set in each training epoch processing data …