A graph placement methodology for fast chip design
Chip floorplanning is the engineering task of designing the physical layout of a computer
chip. Despite five decades of research 1, chip floorplanning has defied automation, requiring …
chip. Despite five decades of research 1, chip floorplanning has defied automation, requiring …
A survey of machine learning for computer architecture and systems
It has been a long time that computer architecture and systems are optimized for efficient
execution of machine learning (ML) models. Now, it is time to reconsider the relationship …
execution of machine learning (ML) models. Now, it is time to reconsider the relationship …
Learning scheduling algorithms for data processing clusters
Efficiently scheduling data processing jobs on distributed compute clusters requires complex
algorithms. Current systems use simple, generalized heuristics and ignore workload …
algorithms. Current systems use simple, generalized heuristics and ignore workload …
Enabling resource-efficient aiot system with cross-level optimization: A survey
The emerging field of artificial intelligence of things (AIoT, AI+ IoT) is driven by the
widespread use of intelligent infrastructures and the impressive success of deep learning …
widespread use of intelligent infrastructures and the impressive success of deep learning …
Chip placement with deep reinforcement learning
In this work, we present a learning-based approach to chip placement, one of the most
complex and time-consuming stages of the chip design process. Unlike prior methods, our …
complex and time-consuming stages of the chip design process. Unlike prior methods, our …
SiP-ML: high-bandwidth optical network interconnects for machine learning training
This paper proposes optical network interconnects as a key enabler for building high-
bandwidth ML training clusters with strong scaling properties. Our design, called SiP-ML …
bandwidth ML training clusters with strong scaling properties. Our design, called SiP-ML …
Verifying learning-augmented systems
The application of deep reinforcement learning (DRL) to computer and networked systems
has recently gained significant popularity. However, the obscurity of decisions by DRL …
has recently gained significant popularity. However, the obscurity of decisions by DRL …
Dreamshard: Generalizable embedding table placement for recommender systems
We study embedding table placement for distributed recommender systems, which aims to
partition and place the tables on multiple hardware devices (eg, GPUs) to balance the …
partition and place the tables on multiple hardware devices (eg, GPUs) to balance the …
A learned performance model for tensor processing units
Accurate hardware performance models are critical to efficient code generation. They can be
used by compilers to make heuristic decisions, by superoptimizers as a minimization …
used by compilers to make heuristic decisions, by superoptimizers as a minimization …
Piper: Multidimensional planner for dnn parallelization
The rapid increase in sizes of state-of-the-art DNN models, and consequently the increase in
the compute and memory requirements of model training, has led to the development of …
the compute and memory requirements of model training, has led to the development of …