EF21-P and friends: Improved theoretical communication complexity for distributed optimization with bidirectional compression

K Gruntkowska, A Tyurin… - … Conference on Machine …, 2023 - proceedings.mlr.press
In this work we focus our attention on distributed optimization problems in the context where
the communication time between the server and the workers is non-negligible. We obtain …

Decentralized SGD and average-direction SAM are asymptotically equivalent

T Zhu, F He, K Chen, M Song… - … Conference on Machine …, 2023 - proceedings.mlr.press
Decentralized stochastic gradient descent (D-SGD) allows collaborative learning on
massive devices simultaneously without the control of a central server. However, existing …

Explicit regularization in overparametrized models via noise injection

A Orvieto, A Raj, H Kersting… - … Conference on Artificial …, 2023 - proceedings.mlr.press
Injecting noise within gradient descent has several desirable features, such as smoothing
and regularizing properties. In this paper, we investigate the effects of injecting noise before …

Correlated noise provably beats independent noise for differentially private learning

CA Choquette-Choo, K Dvijotham, K Pillutla… - arxiv preprint arxiv …, 2023 - arxiv.org
Differentially private learning algorithms inject noise into the learning process. While the
most common private learning algorithm, DP-SGD, adds independent Gaussian noise in …

Gradient descent with linearly correlated noise: Theory and applications to differential privacy

A Koloskova, R McKenna, Z Charles… - Advances in …, 2023 - proceedings.neurips.cc
We study gradient descent under linearly correlated noise. Our work is motivated by recent
practical methods for optimization with differential privacy (DP), such as DP-FTRL, which …

Why is SAM Robust to Label Noise?

C Baek, Z Kolter, A Raghunathan - arxiv preprint arxiv:2405.03676, 2024 - arxiv.org
Sharpness-Aware Minimization (SAM) is most known for achieving state-of the-art
performances on natural image and language tasks. However, its most pronounced …

Optimized injection of noise in activation functions to improve generalization of neural networks

F Duan, F Chapeau-Blondeau, D Abbott - Chaos, Solitons & Fractals, 2024 - Elsevier
This paper proposes a flexible probabilistic activation function that enhances the training
and operation of artificial neural networks by intentionally injecting noise to gain additional …

Implicit regularization in heavy-ball momentum accelerated stochastic gradient descent

A Ghosh, H Lyu, X Zhang, R Wang - arxiv preprint arxiv:2302.00849, 2023 - arxiv.org
It is well known that the finite step-size ($ h $) in Gradient Descent (GD) implicitly regularizes
solutions to flatter minima. A natural question to ask is" Does the momentum parameter …

Edge intelligence over the air: Two faces of interference in federated learning

Z Chen, HH Yang, TQS Quek - IEEE Communications …, 2023 - ieeexplore.ieee.org
Federated edge learning is envisioned as the bedrock of enabling intelligence in next-
generation wireless networks, but the limited spectral resources often constrain its …

Improving generalization of pre-trained language models via stochastic weight averaging

P Lu, I Kobyzev, M Rezagholizadeh, A Rashid… - arxiv preprint arxiv …, 2022 - arxiv.org
Knowledge Distillation (KD) is a commonly used technique for improving the generalization
of compact Pre-trained Language Models (PLMs) on downstream tasks. However, such …