Enabling resource-efficient aiot system with cross-level optimization: A survey

S Liu, B Guo, C Fang, Z Wang, S Luo… - … Surveys & Tutorials, 2023 - ieeexplore.ieee.org
The emerging field of artificial intelligence of things (AIoT, AI+ IoT) is driven by the
widespread use of intelligent infrastructures and the impressive success of deep learning …

Deep learning workload scheduling in gpu datacenters: Taxonomy, challenges and vision

W Gao, Q Hu, Z Ye, P Sun, X Wang, Y Luo… - arxiv preprint arxiv …, 2022 - arxiv.org
Deep learning (DL) shows its prosperity in a wide variety of fields. The development of a DL
model is a time-consuming and resource-intensive procedure. Hence, dedicated GPU …

Enable deep learning on mobile devices: Methods, systems, and applications

H Cai, J Lin, Y Lin, Z Liu, H Tang, H Wang… - ACM Transactions on …, 2022 - dl.acm.org
Deep neural networks (DNNs) have achieved unprecedented success in the field of artificial
intelligence (AI), including computer vision, natural language processing, and speech …

Efficient spatially sparse inference for conditional gans and diffusion models

M Li, J Lin, C Meng, S Ermon… - Advances in neural …, 2022 - proceedings.neurips.cc
During image editing, existing deep generative models tend to re-synthesize the entire
output from scratch, including the unedited regions. This leads to a significant waste of …

Apollo: Automatic partition-based operator fusion through layer by layer optimization

J Zhao, X Gao, R **a, Z Zhang… - Proceedings of …, 2022 - proceedings.mlsys.org
We study fusion for deep neural networks (DNNs) in a just-in-time (JIT) compilation
framework Apollo. It considers both memory-and compute-bound tensor operators for fusion …

Automated runtime-aware scheduling for multi-tenant dnn inference on gpu

F Yu, S Bray, D Wang, L Shangguan… - 2021 IEEE/ACM …, 2021 - ieeexplore.ieee.org
With the fast development of deep neural networks (DNNs), many real-world applications
are adopting multiple models to conduct compound tasks, such as co-running classification …

A survey of multi-tenant deep learning inference on gpu

F Yu, D Wang, L Shangguan, M Zhang, C Liu… - arxiv preprint arxiv …, 2022 - arxiv.org
Deep Learning (DL) models have achieved superior performance. Meanwhile, computing
hardware like NVIDIA GPUs also demonstrated strong computing scaling trends with 2x …

Juggler-ResNet: A flexible and high-speed ResNet optimization method for intrusion detection system in software-defined industrial networks

Z Zhu, W Zhai, H Liu, J Geng, M Zhou… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org
ResNets are widely used in the intrusion detection system (IDS) of software-defined
industrial network to construct accurate intelligence detection of network attacks. However …

Dycl: Dynamic neural network compilation via program rewriting and graph optimization

S Chen, S Wei, C Liu, W Yang - Proceedings of the 32nd ACM SIGSOFT …, 2023 - dl.acm.org
The deep learning (DL) compiler serves as a vital infrastructure component to enable the
deployment of deep neural networks on diverse hardware platforms such as mobile devices …

Out-of-order backprop: An effective scheduling technique for deep learning

H Oh, J Lee, H Kim, J Seo - … of the Seventeenth European Conference on …, 2022 - dl.acm.org
Neural network training requires a large amount of computation and thus GPUs are often
used for the acceleration. While they improve the performance, GPUs are underutilized …