Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
EF21-P and friends: Improved theoretical communication complexity for distributed optimization with bidirectional compression
In this work we focus our attention on distributed optimization problems in the context where
the communication time between the server and the workers is non-negligible. We obtain …
the communication time between the server and the workers is non-negligible. We obtain …
Decentralized SGD and average-direction SAM are asymptotically equivalent
Decentralized stochastic gradient descent (D-SGD) allows collaborative learning on
massive devices simultaneously without the control of a central server. However, existing …
massive devices simultaneously without the control of a central server. However, existing …
Explicit regularization in overparametrized models via noise injection
Injecting noise within gradient descent has several desirable features, such as smoothing
and regularizing properties. In this paper, we investigate the effects of injecting noise before …
and regularizing properties. In this paper, we investigate the effects of injecting noise before …
Correlated noise provably beats independent noise for differentially private learning
Differentially private learning algorithms inject noise into the learning process. While the
most common private learning algorithm, DP-SGD, adds independent Gaussian noise in …
most common private learning algorithm, DP-SGD, adds independent Gaussian noise in …
Gradient descent with linearly correlated noise: Theory and applications to differential privacy
We study gradient descent under linearly correlated noise. Our work is motivated by recent
practical methods for optimization with differential privacy (DP), such as DP-FTRL, which …
practical methods for optimization with differential privacy (DP), such as DP-FTRL, which …
Why is SAM Robust to Label Noise?
Sharpness-Aware Minimization (SAM) is most known for achieving state-of the-art
performances on natural image and language tasks. However, its most pronounced …
performances on natural image and language tasks. However, its most pronounced …
Optimized injection of noise in activation functions to improve generalization of neural networks
This paper proposes a flexible probabilistic activation function that enhances the training
and operation of artificial neural networks by intentionally injecting noise to gain additional …
and operation of artificial neural networks by intentionally injecting noise to gain additional …
Implicit regularization in heavy-ball momentum accelerated stochastic gradient descent
It is well known that the finite step-size ($ h $) in Gradient Descent (GD) implicitly regularizes
solutions to flatter minima. A natural question to ask is" Does the momentum parameter …
solutions to flatter minima. A natural question to ask is" Does the momentum parameter …
Edge intelligence over the air: Two faces of interference in federated learning
Federated edge learning is envisioned as the bedrock of enabling intelligence in next-
generation wireless networks, but the limited spectral resources often constrain its …
generation wireless networks, but the limited spectral resources often constrain its …
Improving generalization of pre-trained language models via stochastic weight averaging
Knowledge Distillation (KD) is a commonly used technique for improving the generalization
of compact Pre-trained Language Models (PLMs) on downstream tasks. However, such …
of compact Pre-trained Language Models (PLMs) on downstream tasks. However, such …