Stability of stochastic gradient descent on nonsmooth convex losses

R Bassily, V Feldman, C Guzmán… - Advances in Neural …, 2020 - proceedings.neurips.cc
Uniform stability is a notion of algorithmic stability that bounds the worst case change in the
model output by the algorithm when a single data point in the dataset is replaced. An …

On Efficient Training of Large-Scale Deep Learning Models

L Shen, Y Sun, Z Yu, L Ding, X Tian, D Tao - ACM Computing Surveys, 2024 - dl.acm.org
The field of deep learning has witnessed significant progress in recent times, particularly in
areas such as computer vision (CV), natural language processing (NLP), and speech. The …

On convergence of FedProx: Local dissimilarity invariant bounds, non-smoothness and beyond

X Yuan, P Li - Advances in Neural Information Processing …, 2022 - proceedings.neurips.cc
The\FedProx~ algorithm is a simple yet powerful distributed proximal point optimization
method widely used for federated learning (FL) over heterogeneous data. Despite its …

On the algorithmic stability of adversarial training

Y **ng, Q Song, G Cheng - Advances in neural information …, 2021 - proceedings.neurips.cc
The adversarial training is a popular tool to remedy the vulnerability of deep learning models
against adversarial attacks, and there is rich theoretical literature on the training loss of …

Information-theoretic generalization bounds for stochastic gradient descent

G Neu, GK Dziugaite, M Haghifam… - … on Learning Theory, 2021 - proceedings.mlr.press
We study the generalization properties of the popular stochastic optimization method known
as stochastic gradient descent (SGD) for optimizing general non-convex loss functions. Our …

Topology-aware generalization of decentralized sgd

T Zhu, F He, L Zhang, Z Niu… - … on Machine Learning, 2022 - proceedings.mlr.press
This paper studies the algorithmic stability and generalizability of decentralized stochastic
gradient descent (D-SGD). We prove that the consensus model learned by D-SGD is …

On the optimization and generalization of multi-head attention

P Deora, R Ghaderi, H Taheri… - arxiv preprint arxiv …, 2023 - arxiv.org
The training and generalization dynamics of the Transformer's core mechanism, namely the
Attention mechanism, remain under-explored. Besides, existing analyses primarily focus on …

Algorithmic stability of heavy-tailed sgd with general loss functions

A Raj, L Zhu, M Gurbuzbalaban… - … on Machine Learning, 2023 - proceedings.mlr.press
Heavy-tail phenomena in stochastic gradient descent (SGD) have been reported in several
empirical studies. Experimental evidence in previous works suggests a strong interplay …

Stability-based generalization analysis of the asynchronous decentralized SGD

X Deng, T Sun, S Li, D Li - Proceedings of the AAAI Conference on …, 2023 - ojs.aaai.org
The generalization ability often determines the success of machine learning algorithms in
practice. Therefore, it is of great theoretical and practical importance to understand and …

Three-way trade-off in multi-objective learning: Optimization, generalization and conflict-avoidance

L Chen, H Fernando, Y Ying… - Advances in Neural …, 2024 - proceedings.neurips.cc
Multi-objective learning (MOL) often arises in emerging machine learning problems when
multiple learning criteria or tasks need to be addressed. Recent works have developed …