Enabling resource-efficient aiot system with cross-level optimization: A survey
The emerging field of artificial intelligence of things (AIoT, AI+ IoT) is driven by the
widespread use of intelligent infrastructures and the impressive success of deep learning …
widespread use of intelligent infrastructures and the impressive success of deep learning …
Deep learning workload scheduling in gpu datacenters: Taxonomy, challenges and vision
Deep learning (DL) shows its prosperity in a wide variety of fields. The development of a DL
model is a time-consuming and resource-intensive procedure. Hence, dedicated GPU …
model is a time-consuming and resource-intensive procedure. Hence, dedicated GPU …
Enable deep learning on mobile devices: Methods, systems, and applications
Deep neural networks (DNNs) have achieved unprecedented success in the field of artificial
intelligence (AI), including computer vision, natural language processing, and speech …
intelligence (AI), including computer vision, natural language processing, and speech …
Efficient spatially sparse inference for conditional gans and diffusion models
During image editing, existing deep generative models tend to re-synthesize the entire
output from scratch, including the unedited regions. This leads to a significant waste of …
output from scratch, including the unedited regions. This leads to a significant waste of …
Apollo: Automatic partition-based operator fusion through layer by layer optimization
J Zhao, X Gao, R **a, Z Zhang… - Proceedings of …, 2022 - proceedings.mlsys.org
We study fusion for deep neural networks (DNNs) in a just-in-time (JIT) compilation
framework Apollo. It considers both memory-and compute-bound tensor operators for fusion …
framework Apollo. It considers both memory-and compute-bound tensor operators for fusion …
Automated runtime-aware scheduling for multi-tenant dnn inference on gpu
With the fast development of deep neural networks (DNNs), many real-world applications
are adopting multiple models to conduct compound tasks, such as co-running classification …
are adopting multiple models to conduct compound tasks, such as co-running classification …
A survey of multi-tenant deep learning inference on gpu
Deep Learning (DL) models have achieved superior performance. Meanwhile, computing
hardware like NVIDIA GPUs also demonstrated strong computing scaling trends with 2x …
hardware like NVIDIA GPUs also demonstrated strong computing scaling trends with 2x …
Juggler-ResNet: A flexible and high-speed ResNet optimization method for intrusion detection system in software-defined industrial networks
Z Zhu, W Zhai, H Liu, J Geng, M Zhou… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org
ResNets are widely used in the intrusion detection system (IDS) of software-defined
industrial network to construct accurate intelligence detection of network attacks. However …
industrial network to construct accurate intelligence detection of network attacks. However …
Dycl: Dynamic neural network compilation via program rewriting and graph optimization
The deep learning (DL) compiler serves as a vital infrastructure component to enable the
deployment of deep neural networks on diverse hardware platforms such as mobile devices …
deployment of deep neural networks on diverse hardware platforms such as mobile devices …
Out-of-order backprop: An effective scheduling technique for deep learning
Neural network training requires a large amount of computation and thus GPUs are often
used for the acceleration. While they improve the performance, GPUs are underutilized …
used for the acceleration. While they improve the performance, GPUs are underutilized …