Enabling resource-efficient aiot system with cross-level optimization: A survey

S Liu, B Guo, C Fang, Z Wang, S Luo… - … Surveys & Tutorials, 2023 - ieeexplore.ieee.org
The emerging field of artificial intelligence of things (AIoT, AI+ IoT) is driven by the
widespread use of intelligent infrastructures and the impressive success of deep learning …

Fine-tuning giant neural networks on commodity hardware with automatic pipeline model parallelism

S Eliad, I Hakimi, A De Jagger, M Silberstein… - 2021 USENIX Annual …, 2021 - usenix.org
Fine-tuning is an increasingly common technique that leverages transfer learning to
dramatically expedite the training of huge, high-quality models. Critically, fine-tuning holds …

vPipe: A Virtualized Acceleration System for Achieving Efficient and Scalable Pipeline Parallel DNN Training

S Zhao, F Li, X Chen, X Guan, J Jiang… - … on Parallel and …, 2021 - ieeexplore.ieee.org
The increasing computational complexity of DNNs achieved unprecedented successes in
various areas such as machine vision and natural language processing (NLP), eg, the …

Tsplit: Fine-grained gpu memory management for efficient dnn training via tensor splitting

X Nie, X Miao, Z Yang, B Cui - 2022 IEEE 38th International …, 2022 - ieeexplore.ieee.org
Since Deep Neural Networks (DNNs) are deeper and larger, performing DNNs training on
existing accelerators (eg, GPUs) is challenging due to their limited device memory capacity …

A Survey on Spatio-temporal Big Data Analytics Ecosystem: Resource Management, Processing Platform, and Applications

H Liang, Z Zhang, C Hu, Y Gong… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
With the rapid evolution of the Internet, Internet of Things (IoT), and geographic information
systems (GIS), spatio-temporal Big Data (STBD) is experiencing exponential growth …

Democratizing AI: Open-source Scalable LLM Training on GPU-based Supercomputers

S Singh, P Singhania, A Ranjan… - … Conference for High …, 2024 - ieeexplore.ieee.org
Training and fine-tuning large language models (LLMs) with hundreds of billions to trillions
of parameters requires tens of thousands of GPUs, and a highly scalable software stack. In …

MegTaiChi: Dynamic tensor-based memory management optimization for DNN training

Z Hu, J **ao, Z Deng, M Li, K Zhang, X Zhang… - Proceedings of the 36th …, 2022 - dl.acm.org
In real applications, it is common to train deep neural networks (DNNs) on modest clusters.
With the continuous increase of model size and batch size, the training of DNNs becomes …

An oracle for guiding large-scale model/hybrid parallel training of convolutional neural networks

AN Kahira, TT Nguyen, LB Gomez, R Takano… - Proceedings of the 30th …, 2021 - dl.acm.org
Deep Neural Network (DNN) frameworks use distributed training to enable faster time to
convergence and alleviate memory capacity limitations when training large models and/or …

PERKS: a Locality-Optimized Execution Model for Iterative Memory-bound GPU Applications

L Zhang, M Wahib, P Chen, J Meng, X Wang… - Proceedings of the 37th …, 2023 - dl.acm.org
Iterative memory-bound solvers commonly occur in HPC codes. Typical GPU
implementations have a loop on the host side that invokes the GPU kernel as much as …

An application-oblivious memory scheduling system for DNN accelerators

J Li, X Wang, X Chen, G Li, X Dong, P Zhao… - ACM Transactions on …, 2022 - dl.acm.org
Deep Neural Networks (DNNs) tend to go deeper and wider, which poses a significant
challenge to the training of DNNs, due to the limited memory capacity of DNN accelerators …