A survey of techniques for optimizing deep learning on GPUs

S Mittal, S Vaishay - Journal of Systems Architecture, 2019 - Elsevier
The rise of deep-learning (DL) has been fuelled by the improvements in accelerators. Due to
its unique features, the GPU continues to remain the most widely used accelerator for DL …

A survey on convolutional neural network accelerators: GPU, FPGA and ASIC

Y Hu, Y Liu, Z Liu - 2022 14th International Conference on …, 2022 - ieeexplore.ieee.org
In recent years, artificial intelligence (AI) has been under rapid development, applied in
various areas. Among a vast number of neural network (NN) models, the convolutional …

Varuna: scalable, low-cost training of massive deep learning models

S Athlur, N Saran, M Sivathanu, R Ramjee… - Proceedings of the …, 2022 - dl.acm.org
Systems for training massive deep learning models (billions of parameters) today assume
and require specialized" hyperclusters": hundreds or thousands of GPUs wired with …

Estimating GPU memory consumption of deep learning models

Y Gao, Y Liu, H Zhang, Z Li, Y Zhu, H Lin… - Proceedings of the 28th …, 2020 - dl.acm.org
Deep learning (DL) has been increasingly adopted by a variety of software-intensive
systems. Developers mainly use GPUs to accelerate the training, testing, and deployment of …

Checkmate: Breaking the memory wall with optimal tensor rematerialization

P Jain, A Jain, A Nrusimha, A Gholami… - Proceedings of …, 2020 - proceedings.mlsys.org
Modern neural networks are increasingly bottlenecked by the limited capacity of on-device
GPU memory. Prior work explores drop** activations as a strategy to scale to larger neural …

Swapadvisor: Pushing deep learning beyond the gpu memory limit via smart swap**

CC Huang, G **, J Li - Proceedings of the Twenty-Fifth International …, 2020 - dl.acm.org
It is known that deeper and wider neural networks can achieve better accuracy. But it is
difficult to continue the trend to increase model size due to limited GPU memory. One …

Capuchin: Tensor-based gpu memory management for deep learning

X Peng, X Shi, H Dai, H **, W Ma, Q **ong… - Proceedings of the …, 2020 - dl.acm.org
In recent years, deep learning has gained unprecedented success in various domains, the
key of the success is the larger and deeper deep neural networks (DNNs) that achieved very …

EXACT: Scalable graph neural networks training via extreme activation compression

Z Liu, K Zhou, F Yang, L Li, R Chen… - … Conference on Learning …, 2021 - openreview.net
Training Graph Neural Networks (GNNs) on large graphs is a fundamental challenge due to
the high memory usage, which is mainly occupied by activations (eg, node embeddings) …

Melon: Breaking the memory wall for resource-efficient on-device machine learning

Q Wang, M Xu, C **, X Dong, J Yuan, X **… - Proceedings of the 20th …, 2022 - dl.acm.org
On-device learning is a promising technique for emerging privacy-preserving machine
learning paradigms. However, through quantitative experiments, we find that commodity …

Llmcad: Fast and scalable on-device large language model inference

D Xu, W Yin, X **, Y Zhang, S Wei, M Xu… - arxiv preprint arxiv …, 2023 - arxiv.org
Generative tasks, such as text generation and question answering, hold a crucial position in
the realm of mobile applications. Due to their sensitivity to privacy concerns, there is a …