Optimization for deep learning: An overview

RY Sun - Journal of the Operations Research Society of China, 2020 - Springer
Optimization is a critical component in deep learning. We think optimization for neural
networks is an interesting topic for theoretical research due to various reasons. First, its …

Scaffold: Stochastic controlled averaging for federated learning

SP Karimireddy, S Kale, M Mohri… - International …, 2020 - proceedings.mlr.press
Federated learning is a key scenario in modern large-scale machine learning where the
data remains distributed over a large number of clients and the task is to learn a centralized …

Federated learning: Challenges, methods, and future directions

T Li, AK Sahu, A Talwalkar… - IEEE signal processing …, 2020 - ieeexplore.ieee.org
Federated learning involves training statistical models over remote devices or siloed data
centers, such as mobile phones or hospitals, while kee** data localized. Training in …

Federated optimization in heterogeneous networks

T Li, AK Sahu, M Zaheer, M Sanjabi… - … of Machine learning …, 2020 - proceedings.mlsys.org
Federated Learning is a distributed learning paradigm with two key challenges that
differentiate it from traditional distributed optimization:(1) significant variability in terms of the …

Harmofl: Harmonizing local and global drifts in federated learning on heterogeneous medical images

M Jiang, Z Wang, Q Dou - Proceedings of the AAAI Conference on …, 2022 - ojs.aaai.org
Multiple medical institutions collaboratively training a model using federated learning (FL)
has become a promising solution for maximizing the potential of data-driven models, yet the …

Adam can converge without any modification on update rules

Y Zhang, C Chen, N Shi, R Sun… - Advances in neural …, 2022 - proceedings.neurips.cc
Ever since\citet {reddi2019convergence} pointed out the divergence issue of Adam, many
new variants have been designed to obtain convergence. However, vanilla Adam remains …

SGD: General analysis and improved rates

RM Gower, N Loizou, X Qian… - International …, 2019 - proceedings.mlr.press
We propose a general yet simple theorem describing the convergence of SGD under the
arbitrary sampling paradigm. Our theorem describes the convergence of an infinite array of …

Loss landscapes and optimization in over-parameterized non-linear systems and neural networks

C Liu, L Zhu, M Belkin - Applied and Computational Harmonic Analysis, 2022 - Elsevier
The success of deep learning is due, to a large extent, to the remarkable effectiveness of
gradient-based optimization methods applied to large neural networks. The purpose of this …

Mime: Mimicking centralized stochastic algorithms in federated learning

SP Karimireddy, M Jaggi, S Kale, M Mohri… - arxiv preprint arxiv …, 2020 - arxiv.org
Federated learning (FL) is a challenging setting for optimization due to the heterogeneity of
the data across different clients which gives rise to the client drift phenomenon. In fact …

Reasonable effectiveness of random weighting: A litmus test for multi-task learning

B Lin, F Ye, Y Zhang, IW Tsang - arxiv preprint arxiv:2111.10603, 2021 - arxiv.org
Multi-Task Learning (MTL) has achieved success in various fields. However, how to balance
different tasks to achieve good performance is a key problem. To achieve the task balancing …