Stability of stochastic gradient descent on nonsmooth convex losses
Uniform stability is a notion of algorithmic stability that bounds the worst case change in the
model output by the algorithm when a single data point in the dataset is replaced. An …
model output by the algorithm when a single data point in the dataset is replaced. An …
On Efficient Training of Large-Scale Deep Learning Models
The field of deep learning has witnessed significant progress in recent times, particularly in
areas such as computer vision (CV), natural language processing (NLP), and speech. The …
areas such as computer vision (CV), natural language processing (NLP), and speech. The …
On convergence of FedProx: Local dissimilarity invariant bounds, non-smoothness and beyond
The\FedProx~ algorithm is a simple yet powerful distributed proximal point optimization
method widely used for federated learning (FL) over heterogeneous data. Despite its …
method widely used for federated learning (FL) over heterogeneous data. Despite its …
On the algorithmic stability of adversarial training
The adversarial training is a popular tool to remedy the vulnerability of deep learning models
against adversarial attacks, and there is rich theoretical literature on the training loss of …
against adversarial attacks, and there is rich theoretical literature on the training loss of …
Information-theoretic generalization bounds for stochastic gradient descent
We study the generalization properties of the popular stochastic optimization method known
as stochastic gradient descent (SGD) for optimizing general non-convex loss functions. Our …
as stochastic gradient descent (SGD) for optimizing general non-convex loss functions. Our …
Topology-aware generalization of decentralized sgd
This paper studies the algorithmic stability and generalizability of decentralized stochastic
gradient descent (D-SGD). We prove that the consensus model learned by D-SGD is …
gradient descent (D-SGD). We prove that the consensus model learned by D-SGD is …
On the optimization and generalization of multi-head attention
The training and generalization dynamics of the Transformer's core mechanism, namely the
Attention mechanism, remain under-explored. Besides, existing analyses primarily focus on …
Attention mechanism, remain under-explored. Besides, existing analyses primarily focus on …
Algorithmic stability of heavy-tailed sgd with general loss functions
Heavy-tail phenomena in stochastic gradient descent (SGD) have been reported in several
empirical studies. Experimental evidence in previous works suggests a strong interplay …
empirical studies. Experimental evidence in previous works suggests a strong interplay …
Stability-based generalization analysis of the asynchronous decentralized SGD
The generalization ability often determines the success of machine learning algorithms in
practice. Therefore, it is of great theoretical and practical importance to understand and …
practice. Therefore, it is of great theoretical and practical importance to understand and …
Three-way trade-off in multi-objective learning: Optimization, generalization and conflict-avoidance
Multi-objective learning (MOL) often arises in emerging machine learning problems when
multiple learning criteria or tasks need to be addressed. Recent works have developed …
multiple learning criteria or tasks need to be addressed. Recent works have developed …