Enabling resource-efficient aiot system with cross-level optimization: A survey
The emerging field of artificial intelligence of things (AIoT, AI+ IoT) is driven by the
widespread use of intelligent infrastructures and the impressive success of deep learning …
widespread use of intelligent infrastructures and the impressive success of deep learning …
Fine-tuning giant neural networks on commodity hardware with automatic pipeline model parallelism
Fine-tuning is an increasingly common technique that leverages transfer learning to
dramatically expedite the training of huge, high-quality models. Critically, fine-tuning holds …
dramatically expedite the training of huge, high-quality models. Critically, fine-tuning holds …
vPipe: A Virtualized Acceleration System for Achieving Efficient and Scalable Pipeline Parallel DNN Training
The increasing computational complexity of DNNs achieved unprecedented successes in
various areas such as machine vision and natural language processing (NLP), eg, the …
various areas such as machine vision and natural language processing (NLP), eg, the …
Tsplit: Fine-grained gpu memory management for efficient dnn training via tensor splitting
Since Deep Neural Networks (DNNs) are deeper and larger, performing DNNs training on
existing accelerators (eg, GPUs) is challenging due to their limited device memory capacity …
existing accelerators (eg, GPUs) is challenging due to their limited device memory capacity …
A Survey on Spatio-temporal Big Data Analytics Ecosystem: Resource Management, Processing Platform, and Applications
With the rapid evolution of the Internet, Internet of Things (IoT), and geographic information
systems (GIS), spatio-temporal Big Data (STBD) is experiencing exponential growth …
systems (GIS), spatio-temporal Big Data (STBD) is experiencing exponential growth …
Democratizing AI: Open-source Scalable LLM Training on GPU-based Supercomputers
S Singh, P Singhania, A Ranjan… - … Conference for High …, 2024 - ieeexplore.ieee.org
Training and fine-tuning large language models (LLMs) with hundreds of billions to trillions
of parameters requires tens of thousands of GPUs, and a highly scalable software stack. In …
of parameters requires tens of thousands of GPUs, and a highly scalable software stack. In …
MegTaiChi: Dynamic tensor-based memory management optimization for DNN training
Z Hu, J **ao, Z Deng, M Li, K Zhang, X Zhang… - Proceedings of the 36th …, 2022 - dl.acm.org
In real applications, it is common to train deep neural networks (DNNs) on modest clusters.
With the continuous increase of model size and batch size, the training of DNNs becomes …
With the continuous increase of model size and batch size, the training of DNNs becomes …
An oracle for guiding large-scale model/hybrid parallel training of convolutional neural networks
Deep Neural Network (DNN) frameworks use distributed training to enable faster time to
convergence and alleviate memory capacity limitations when training large models and/or …
convergence and alleviate memory capacity limitations when training large models and/or …
PERKS: a Locality-Optimized Execution Model for Iterative Memory-bound GPU Applications
Iterative memory-bound solvers commonly occur in HPC codes. Typical GPU
implementations have a loop on the host side that invokes the GPU kernel as much as …
implementations have a loop on the host side that invokes the GPU kernel as much as …
An application-oblivious memory scheduling system for DNN accelerators
Deep Neural Networks (DNNs) tend to go deeper and wider, which poses a significant
challenge to the training of DNNs, due to the limited memory capacity of DNN accelerators …
challenge to the training of DNNs, due to the limited memory capacity of DNN accelerators …