- Academic Search

EF21-P and friends: Improved theoretical communication complexity for distributed optimization with bidirectional compression

K Gruntkowska, A Tyurin… - … Conference on Machine …, 2023 - proceedings.mlr.press

In this work we focus our attention on distributed optimization problems in the context where
the communication time between the server and the workers is non-negligible. We obtain …

Gem Citer Citeret af 29 Relaterede artikler Alle 11 versioner Vis som HTML

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Decentralized SGD and average-direction SAM are asymptotically equivalent

T Zhu, F He, K Chen, M Song… - … Conference on Machine …, 2023 - proceedings.mlr.press

Decentralized stochastic gradient descent (D-SGD) allows collaborative learning on
massive devices simultaneously without the control of a central server. However, existing …

Gem Citer Citeret af 15 Relaterede artikler Alle 12 versioner Vis som HTML

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Explicit regularization in overparametrized models via noise injection

A Orvieto, A Raj, H Kersting… - … Conference on Artificial …, 2023 - proceedings.mlr.press

Injecting noise within gradient descent has several desirable features, such as smoothing
and regularizing properties. In this paper, we investigate the effects of injecting noise before …

Gem Citer Citeret af 28 Relaterede artikler Alle 5 versioner Vis som HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Correlated noise provably beats independent noise for differentially private learning

CA Choquette-Choo, K Dvijotham, K Pillutla… - arxiv preprint arxiv …, 2023 - arxiv.org

Differentially private learning algorithms inject noise into the learning process. While the
most common private learning algorithm, DP-SGD, adds independent Gaussian noise in …

Gem Citer Citeret af 16 Relaterede artikler Alle 5 versioner Vis som HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Gradient descent with linearly correlated noise: Theory and applications to differential privacy

A Koloskova, R McKenna, Z Charles… - Advances in …, 2023 - proceedings.neurips.cc

We study gradient descent under linearly correlated noise. Our work is motivated by recent
practical methods for optimization with differential privacy (DP), such as DP-FTRL, which …

Gem Citer Citeret af 17 Relaterede artikler Alle 6 versioner Vis som HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Why is SAM Robust to Label Noise?

C Baek, Z Kolter, A Raghunathan - arxiv preprint arxiv:2405.03676, 2024 - arxiv.org

Sharpness-Aware Minimization (SAM) is most known for achieving state-of the-art
performances on natural image and language tasks. However, its most pronounced …

Gem Citer Citeret af 11 Relaterede artikler Alle 3 versioner Vis som HTML

[Free GPT-4]
[DeepSeek]

[PDF] univ-angers.fr

Optimized injection of noise in activation functions to improve generalization of neural networks

F Duan, F Chapeau-Blondeau, D Abbott - Chaos, Solitons & Fractals, 2024 - Elsevier

This paper proposes a flexible probabilistic activation function that enhances the training
and operation of artificial neural networks by intentionally injecting noise to gain additional …

Gem Citer Citeret af 9 Relaterede artikler Alle 7 versioner

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Implicit regularization in heavy-ball momentum accelerated stochastic gradient descent

A Ghosh, H Lyu, X Zhang, R Wang - arxiv preprint arxiv:2302.00849, 2023 - arxiv.org

It is well known that the finite step-size ($ h $) in Gradient Descent (GD) implicitly regularizes
solutions to flatter minima. A natural question to ask is" Does the momentum parameter …

Gem Citer Citeret af 21 Relaterede artikler Alle 4 versioner Vis som HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Edge intelligence over the air: Two faces of interference in federated learning

Z Chen, HH Yang, TQS Quek - IEEE Communications …, 2023 - ieeexplore.ieee.org

Federated edge learning is envisioned as the bedrock of enabling intelligence in next-
generation wireless networks, but the limited spectral resources often constrain its …

Gem Citer Citeret af 12 Relaterede artikler Alle 5 versioner

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Improving generalization of pre-trained language models via stochastic weight averaging

P Lu, I Kobyzev, M Rezagholizadeh, A Rashid… - arxiv preprint arxiv …, 2022 - arxiv.org

Knowledge Distillation (KD) is a commonly used technique for improving the generalization
of compact Pre-trained Language Models (PLMs) on downstream tasks. However, such …

Gem Citer Citeret af 12 Relaterede artikler Alle 3 versioner Vis som HTML

Opret underretning

Citer

Avanceret søgning

Gemt i Min samling

Anticorrelated noise injection for improved generalization

EF21-P and friends: Improved theoretical communication complexity for distributed optimization with bidirectional compression

Decentralized SGD and average-direction SAM are asymptotically equivalent

Explicit regularization in overparametrized models via noise injection

Correlated noise provably beats independent noise for differentially private learning

Gradient descent with linearly correlated noise: Theory and applications to differential privacy

Why is SAM Robust to Label Noise?

Optimized injection of noise in activation functions to improve generalization of neural networks

Implicit regularization in heavy-ball momentum accelerated stochastic gradient descent

Edge intelligence over the air: Two faces of interference in federated learning

Improving generalization of pre-trained language models via stochastic weight averaging