Google Academic

K Mishchenko, A Khaled… - Advances in Neural …, 2020 - proceedings.neurips.cc

Random Reshuffling (RR) is an algorithm for minimizing finite-sum functions that utilizes
iterative gradient descent steps in conjunction with data reshuffling. Often contrasted with its …

Salvați Citați Citat de 166 ori Articole cu conținut similar Toate cele 12 versiuni Afișare ca HTML

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

On the convergence of federated averaging with cyclic client participation

YJ Cho, P Sharma, G Joshi, Z Xu… - International …, 2023 - proceedings.mlr.press

Abstract Federated Averaging (FedAvg) and its variants are the most popular optimization
algorithms in federated learning (FL). Previous convergence analyses of FedAvg either …

Salvați Citați Citat de 31 ori Articole cu conținut similar Toate cele 11 versiuni Afișare ca HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Recent theoretical advances in non-convex optimization

M Danilova, P Dvurechensky, A Gasnikov… - … and Probability: With a …, 2022 - Springer

Motivated by recent increased interest in optimization algorithms for non-convex
optimization in application to training deep neural networks and other optimization problems …

Salvați Citați Citat de 113 ori Articole cu conținut similar Toate cele 10 versiuni

[Free GPT-4]
[DeepSeek]

[PDF] acm.org

On the impact of machine learning randomness on group fairness

P Ganesh, H Chang, M Strobel, R Shokri - Proceedings of the 2023 ACM …, 2023 - dl.acm.org

Statistical measures for group fairness in machine learning reflect the gap in performance of
algorithms across different groups. These measures, however, exhibit a high variance …

Salvați Citați Citat de 32 ori Articole cu conținut similar Toate cele 7 versiuni

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Sgd with shuffling: optimal rates without component convexity and large epoch requirements

K Ahn, C Yun, S Sra - Advances in Neural Information …, 2020 - proceedings.neurips.cc

We study without-replacement SGD for solving finite-sum optimization problems.
Specifically, depending on how the indices of the finite-sum are shuffled, we consider the …

Salvați Citați Citat de 83 ori Articole cu conținut similar Toate cele 6 versiuni Afișare ca HTML

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Asgrad: A sharp unified analysis of asynchronous-sgd algorithms

R Islamov, M Safaryan… - … Conference on Artificial …, 2024 - proceedings.mlr.press

We analyze asynchronous-type algorithms for distributed SGD in the heterogeneous setting,
where each worker has its own computation and communication speeds, as well as data …

Salvați Citați Citat de 15 ori Articole cu conținut similar Toate cele 7 versiuni Afișare ca HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Convergence of random reshuffling under the kurdyka–łojasiewicz inequality

X Li, A Milzarek, J Qiu - SIAM Journal on Optimization, 2023 - SIAM

We study the random reshuffling () method for smooth nonconvex optimization problems
with a finite-sum structure. Though this method is widely utilized in practice, eg, in the …

Salvați Citați Citat de 43 ori Articole cu conținut similar Toate cele 3 versiuni

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Minibatch vs local SGD with shuffling: Tight convergence bounds and beyond

C Yun, S Rajput, S Sra - arxiv preprint arxiv:2110.10342, 2021 - arxiv.org

In distributed learning, local SGD (also known as federated averaging) and its simple
baseline minibatch SGD are widely studied optimization methods. Most existing analyses of …

Salvați Citați Citat de 45 ori Articole cu conținut similar Toate cele 5 versiuni Afișare ca HTML

Adaptive step size rules for stochastic optimization in large-scale learning

Z Yang, L Ma - Statistics and Computing, 2023 - Springer

The importance of the step size in stochastic optimization has been confirmed both
theoretically and empirically during the past few decades and reconsidered in recent years …

Salvați Citați Citat de 6 ori Articole cu conținut similar Toate cele 2 versiuni

[Free GPT-4]
[DeepSeek]

[PDF] hal.science

Why globally re-shuffle? Revisiting data shuffling in large scale deep learning

TT Nguyen, F Trahay, J Domke, A Drozd… - 2022 IEEE …, 2022 - ieeexplore.ieee.org

Stochastic gradient descent (SGD) is the most prevalent algorithm for training Deep Neural
Networks (DNN). SGD iterates the input data set in each training epoch processing data …

Salvați Citați Citat de 37 ori Articole cu conținut similar Toate cele 10 versiuni

Creează alerta

Citați

Căutare avansată

Salvat în Bibliotecă

A unified convergence analysis for shuffling-type gradient methods

Random reshuffling: Simple analysis with vast improvements

On the convergence of federated averaging with cyclic client participation

Recent theoretical advances in non-convex optimization

On the impact of machine learning randomness on group fairness

Sgd with shuffling: optimal rates without component convexity and large epoch requirements

Asgrad: A sharp unified analysis of asynchronous-sgd algorithms

Convergence of random reshuffling under the kurdyka–łojasiewicz inequality

Minibatch vs local SGD with shuffling: Tight convergence bounds and beyond

Adaptive step size rules for stochastic optimization in large-scale learning

Why globally re-shuffle? Revisiting data shuffling in large scale deep learning