[KNIHA][B] Dive into deep learning

A Zhang, ZC Lipton, M Li, AJ Smola - 2023 - books.google.com
Deep learning has revolutionized pattern recognition, introducing tools that power a wide
range of technologies in such diverse fields as computer vision, natural language …

Towards efficient and scalable sharpness-aware minimization

Y Liu, S Mai, X Chen, CJ Hsieh… - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com
Abstract Recently, Sharpness-Aware Minimization (SAM), which connects the geometry of
the loss landscape and generalization, has demonstrated a significant performance boost …

Deep leakage from gradients

L Zhu, Z Liu, S Han - Advances in neural information …, 2019 - proceedings.neurips.cc
Passing gradient is a widely used scheme in modern multi-node learning system (eg,
distributed training, collaborative learning). In a long time, people used to believe that …

PipeDream: Generalized pipeline parallelism for DNN training

D Narayanan, A Harlap, A Phanishayee… - Proceedings of the 27th …, 2019 - dl.acm.org
DNN training is extremely time-consuming, necessitating efficient multi-accelerator
parallelization. Current approaches to parallelizing training primarily use intra-batch …

Lookahead optimizer: k steps forward, 1 step back

M Zhang, J Lucas, J Ba… - Advances in neural …, 2019 - proceedings.neurips.cc
The vast majority of successful deep neural networks are trained using variants of stochastic
gradient descent (SGD) algorithms. Recent attempts to improve SGD can be broadly …

Grad-match: Gradient matching based data subset selection for efficient deep model training

K Killamsetty, S Durga… - International …, 2021 - proceedings.mlr.press
The great success of modern machine learning models on large datasets is contingent on
extensive computational resources with high financial and environmental costs. One way to …

Large batch optimization for deep learning: Training bert in 76 minutes

Y You, J Li, S Reddi, J Hseu, S Kumar… - arxiv preprint arxiv …, 2019 - arxiv.org
Training large deep neural networks on massive datasets is computationally very
challenging. There has been recent surge in interest in using large batch stochastic …