[KNIHA][B] Dive into deep learning
Deep learning has revolutionized pattern recognition, introducing tools that power a wide
range of technologies in such diverse fields as computer vision, natural language …
range of technologies in such diverse fields as computer vision, natural language …
Towards efficient and scalable sharpness-aware minimization
Abstract Recently, Sharpness-Aware Minimization (SAM), which connects the geometry of
the loss landscape and generalization, has demonstrated a significant performance boost …
the loss landscape and generalization, has demonstrated a significant performance boost …
Deep leakage from gradients
Passing gradient is a widely used scheme in modern multi-node learning system (eg,
distributed training, collaborative learning). In a long time, people used to believe that …
distributed training, collaborative learning). In a long time, people used to believe that …
PipeDream: Generalized pipeline parallelism for DNN training
DNN training is extremely time-consuming, necessitating efficient multi-accelerator
parallelization. Current approaches to parallelizing training primarily use intra-batch …
parallelization. Current approaches to parallelizing training primarily use intra-batch …
Lookahead optimizer: k steps forward, 1 step back
The vast majority of successful deep neural networks are trained using variants of stochastic
gradient descent (SGD) algorithms. Recent attempts to improve SGD can be broadly …
gradient descent (SGD) algorithms. Recent attempts to improve SGD can be broadly …
Grad-match: Gradient matching based data subset selection for efficient deep model training
The great success of modern machine learning models on large datasets is contingent on
extensive computational resources with high financial and environmental costs. One way to …
extensive computational resources with high financial and environmental costs. One way to …
Large batch optimization for deep learning: Training bert in 76 minutes
Training large deep neural networks on massive datasets is computationally very
challenging. There has been recent surge in interest in using large batch stochastic …
challenging. There has been recent surge in interest in using large batch stochastic …