Enabling resource-efficient aiot system with cross-level optimization: A survey
The emerging field of artificial intelligence of things (AIoT, AI+ IoT) is driven by the
widespread use of intelligent infrastructures and the impressive success of deep learning …
widespread use of intelligent infrastructures and the impressive success of deep learning …
Towards artificial general intelligence (agi) in the internet of things (iot): Opportunities and challenges
Artificial General Intelligence (AGI), possessing the capacity to comprehend, learn, and
execute tasks with human cognitive abilities, engenders significant anticipation and intrigue …
execute tasks with human cognitive abilities, engenders significant anticipation and intrigue …
Apollo: Automatic partition-based operator fusion through layer by layer optimization
J Zhao, X Gao, R **a, Z Zhang… - Proceedings of …, 2022 - proceedings.mlsys.org
We study fusion for deep neural networks (DNNs) in a just-in-time (JIT) compilation
framework Apollo. It considers both memory-and compute-bound tensor operators for fusion …
framework Apollo. It considers both memory-and compute-bound tensor operators for fusion …
A survey of resource-efficient llm and multimodal foundation models
Large foundation models, including large language models (LLMs), vision transformers
(ViTs), diffusion, and LLM-based multimodal models, are revolutionizing the entire machine …
(ViTs), diffusion, and LLM-based multimodal models, are revolutionizing the entire machine …
AStitch: enabling a new multi-dimensional optimization space for memory-intensive ML training and inference on modern SIMT architectures
This work reveals that memory-intensive computation is a rising performance-critical factor in
recent machine learning models. Due to a unique set of new challenges, existing ML …
recent machine learning models. Due to a unique set of new challenges, existing ML …
{USHER}: Holistic Interference Avoidance for Resource Optimized {ML} Inference
Minimizing monetary cost and maximizing the goodput of inference serving systems are
increasingly important with the ever-increasing popularity of deep learning models. While it …
increasingly important with the ever-increasing popularity of deep learning models. While it …
Dreamshard: Generalizable embedding table placement for recommender systems
We study embedding table placement for distributed recommender systems, which aims to
partition and place the tables on multiple hardware devices (eg, GPUs) to balance the …
partition and place the tables on multiple hardware devices (eg, GPUs) to balance the …
Large models for intelligent transportation systems and autonomous vehicles: A survey
Large models are widely used in intelligent transportation systems (ITS) and autonomous
vehicles (AV) due to their excellent new capabilities such as intelligence emergence …
vehicles (AV) due to their excellent new capabilities such as intelligence emergence …
Understanding gnn computational graph: A coordinated computation, io, and memory perspective
Abstract Graph Neural Networks (GNNs) have been widely used in various domains, and
GNNs with sophisticated computational graph lead to higher latency and larger memory …
GNNs with sophisticated computational graph lead to higher latency and larger memory …
Tileflow: A framework for modeling fusion dataflow via tree-based analysis
With the increasing size of DNN models and the growing discrepancy between compute
performance and memory bandwidth, fusing multiple layers together to reduce off-chip …
performance and memory bandwidth, fusing multiple layers together to reduce off-chip …