A survey of techniques for optimizing deep learning on GPUs
The rise of deep-learning (DL) has been fuelled by the improvements in accelerators. Due to
its unique features, the GPU continues to remain the most widely used accelerator for DL …
its unique features, the GPU continues to remain the most widely used accelerator for DL …
A survey on convolutional neural network accelerators: GPU, FPGA and ASIC
Y Hu, Y Liu, Z Liu - 2022 14th International Conference on …, 2022 - ieeexplore.ieee.org
In recent years, artificial intelligence (AI) has been under rapid development, applied in
various areas. Among a vast number of neural network (NN) models, the convolutional …
various areas. Among a vast number of neural network (NN) models, the convolutional …
Varuna: scalable, low-cost training of massive deep learning models
Systems for training massive deep learning models (billions of parameters) today assume
and require specialized" hyperclusters": hundreds or thousands of GPUs wired with …
and require specialized" hyperclusters": hundreds or thousands of GPUs wired with …
Estimating GPU memory consumption of deep learning models
Deep learning (DL) has been increasingly adopted by a variety of software-intensive
systems. Developers mainly use GPUs to accelerate the training, testing, and deployment of …
systems. Developers mainly use GPUs to accelerate the training, testing, and deployment of …
Checkmate: Breaking the memory wall with optimal tensor rematerialization
Modern neural networks are increasingly bottlenecked by the limited capacity of on-device
GPU memory. Prior work explores drop** activations as a strategy to scale to larger neural …
GPU memory. Prior work explores drop** activations as a strategy to scale to larger neural …
Swapadvisor: Pushing deep learning beyond the gpu memory limit via smart swap**
It is known that deeper and wider neural networks can achieve better accuracy. But it is
difficult to continue the trend to increase model size due to limited GPU memory. One …
difficult to continue the trend to increase model size due to limited GPU memory. One …
Capuchin: Tensor-based gpu memory management for deep learning
In recent years, deep learning has gained unprecedented success in various domains, the
key of the success is the larger and deeper deep neural networks (DNNs) that achieved very …
key of the success is the larger and deeper deep neural networks (DNNs) that achieved very …
EXACT: Scalable graph neural networks training via extreme activation compression
Training Graph Neural Networks (GNNs) on large graphs is a fundamental challenge due to
the high memory usage, which is mainly occupied by activations (eg, node embeddings) …
the high memory usage, which is mainly occupied by activations (eg, node embeddings) …
Melon: Breaking the memory wall for resource-efficient on-device machine learning
On-device learning is a promising technique for emerging privacy-preserving machine
learning paradigms. However, through quantitative experiments, we find that commodity …
learning paradigms. However, through quantitative experiments, we find that commodity …
Llmcad: Fast and scalable on-device large language model inference
Generative tasks, such as text generation and question answering, hold a crucial position in
the realm of mobile applications. Due to their sensitivity to privacy concerns, there is a …
the realm of mobile applications. Due to their sensitivity to privacy concerns, there is a …