Enabling resource-efficient aiot system with cross-level optimization: A survey

S Liu, B Guo, C Fang, Z Wang, S Luo… - … Surveys & Tutorials, 2023 - ieeexplore.ieee.org
The emerging field of artificial intelligence of things (AIoT, AI+ IoT) is driven by the
widespread use of intelligent infrastructures and the impressive success of deep learning …

Towards artificial general intelligence (agi) in the internet of things (iot): Opportunities and challenges

F Dou, J Ye, G Yuan, Q Lu, W Niu, H Sun… - arxiv preprint arxiv …, 2023 - arxiv.org
Artificial General Intelligence (AGI), possessing the capacity to comprehend, learn, and
execute tasks with human cognitive abilities, engenders significant anticipation and intrigue …

Apollo: Automatic partition-based operator fusion through layer by layer optimization

J Zhao, X Gao, R **a, Z Zhang… - Proceedings of …, 2022 - proceedings.mlsys.org
We study fusion for deep neural networks (DNNs) in a just-in-time (JIT) compilation
framework Apollo. It considers both memory-and compute-bound tensor operators for fusion …

A survey of resource-efficient llm and multimodal foundation models

M Xu, W Yin, D Cai, R Yi, D Xu, Q Wang, B Wu… - arxiv preprint arxiv …, 2024 - arxiv.org
Large foundation models, including large language models (LLMs), vision transformers
(ViTs), diffusion, and LLM-based multimodal models, are revolutionizing the entire machine …

AStitch: enabling a new multi-dimensional optimization space for memory-intensive ML training and inference on modern SIMT architectures

Z Zheng, X Yang, P Zhao, G Long, K Zhu… - Proceedings of the 27th …, 2022 - dl.acm.org
This work reveals that memory-intensive computation is a rising performance-critical factor in
recent machine learning models. Due to a unique set of new challenges, existing ML …

{USHER}: Holistic Interference Avoidance for Resource Optimized {ML} Inference

SS Shubha, H Shen, A Iyer - 18th USENIX Symposium on Operating …, 2024 - usenix.org
Minimizing monetary cost and maximizing the goodput of inference serving systems are
increasingly important with the ever-increasing popularity of deep learning models. While it …

Dreamshard: Generalizable embedding table placement for recommender systems

D Zha, L Feng, Q Tan, Z Liu, KH Lai… - Advances in …, 2022 - proceedings.neurips.cc
We study embedding table placement for distributed recommender systems, which aims to
partition and place the tables on multiple hardware devices (eg, GPUs) to balance the …

Large models for intelligent transportation systems and autonomous vehicles: A survey

L Gan, W Chu, G Li, X Tang, K Li - Advanced Engineering Informatics, 2024 - Elsevier
Large models are widely used in intelligent transportation systems (ITS) and autonomous
vehicles (AV) due to their excellent new capabilities such as intelligence emergence …

Understanding gnn computational graph: A coordinated computation, io, and memory perspective

H Zhang, Z Yu, G Dai, G Huang… - Proceedings of …, 2022 - proceedings.mlsys.org
Abstract Graph Neural Networks (GNNs) have been widely used in various domains, and
GNNs with sophisticated computational graph lead to higher latency and larger memory …

Tileflow: A framework for modeling fusion dataflow via tree-based analysis

S Zheng, S Chen, S Gao, L Jia, G Sun… - Proceedings of the 56th …, 2023 - dl.acm.org
With the increasing size of DNN models and the growing discrepancy between compute
performance and memory bandwidth, fusing multiple layers together to reduce off-chip …