A graph placement methodology for fast chip design
Chip floorplanning is the engineering task of designing the physical layout of a computer
chip. Despite five decades of research 1, chip floorplanning has defied automation, requiring …
chip. Despite five decades of research 1, chip floorplanning has defied automation, requiring …
Enabling resource-efficient aiot system with cross-level optimization: A survey
The emerging field of artificial intelligence of things (AIoT, AI+ IoT) is driven by the
widespread use of intelligent infrastructures and the impressive success of deep learning …
widespread use of intelligent infrastructures and the impressive success of deep learning …
Full stack optimization of transformer inference: a survey
Recent advances in state-of-the-art DNN architecture design have been moving toward
Transformer models. These models achieve superior accuracy across a wide range of …
Transformer models. These models achieve superior accuracy across a wide range of …
Evaluating language models for efficient code generation
We introduce Differential Performance Evaluation (DPE), a framework designed to reliably
evaluate Large Language Models (LLMs) for efficient code generation. Traditional coding …
evaluate Large Language Models (LLMs) for efficient code generation. Traditional coding …
Surco: Learning linear surrogates for combinatorial nonlinear optimization problems
Optimization problems with nonlinear cost functions and combinatorial constraints appear in
many real-world applications but remain challenging to solve efficiently compared to their …
many real-world applications but remain challenging to solve efficiently compared to their …
Hasco: Towards agile hardware and software co-design for tensor computation
Tensor computations overwhelm traditional general-purpose computing devices due to the
large amounts of data and operations of the computations. They call for a holistic solution …
large amounts of data and operations of the computations. They call for a holistic solution …
A learned performance model for tensor processing units
Accurate hardware performance models are critical to efficient code generation. They can be
used by compilers to make heuristic decisions, by superoptimizers as a minimization …
used by compilers to make heuristic decisions, by superoptimizers as a minimization …
A full-stack search technique for domain optimized deep learning accelerators
The rapidly-changing deep learning landscape presents a unique opportunity for building
inference accelerators optimized for specific datacenter-scale workloads. We propose Full …
inference accelerators optimized for specific datacenter-scale workloads. We propose Full …
Piper: Multidimensional planner for dnn parallelization
The rapid increase in sizes of state-of-the-art DNN models, and consequently the increase in
the compute and memory requirements of model training, has led to the development of …
the compute and memory requirements of model training, has led to the development of …
Robust scheduling with gflownets
Finding the best way to schedule operations in a computation graph is a classical NP-hard
problem which is central to compiler optimization. However, evaluating the goodness of a …
problem which is central to compiler optimization. However, evaluating the goodness of a …