Taskflow: A lightweight parallel and heterogeneous task graph computing system
Taskflow aims to streamline the building of parallel and heterogeneous applications using a
lightweight task graph-based approach. Taskflow introduces an expressive task graph …
lightweight task graph-based approach. Taskflow introduces an expressive task graph …
Gpu-accelerated static timing analysis
The ever-increasing power of graphics processing units (GPUs) has opened new
opportunities for accelerating static timing analysis (STA) to a new milestone. Develo** a …
opportunities for accelerating static timing analysis (STA) to a new milestone. Develo** a …
OpenTimer v2: A new parallel incremental timing analysis engine
Since the first release in 2015, OpenTimer v1 has been used in many industrial and
academic projects for analyzing the timing of custom designs. After four-year research and …
academic projects for analyzing the timing of custom designs. After four-year research and …
MAGICAL: Toward fully automated analog IC layout leveraging human and machine intelligence
Despite tremendous advancement of digital IC design automation tools over the last few
decades, analog IC layout is still heavily manual which is very tedious and error-prone. This …
decades, analog IC layout is still heavily manual which is very tedious and error-prone. This …
Snicit: Accelerating sparse neural network inference via compression at inference time on gpu
Sparse deep neural network (DNN) has become an important technique for reducing the
inference cost of large DNNs. However, computing large sparse DNNs is very challenging …
inference cost of large DNNs. However, computing large sparse DNNs is very challenging …
From rtl to cuda: A gpu acceleration flow for rtl simulation with batch stimulus
High-throughput RTL simulation is critical for verifying today's highly complex SoCs. Recent
research has explored accelerating RTL simulation by leveraging event-driven approaches …
research has explored accelerating RTL simulation by leveraging event-driven approaches …
G-PASTA: Gpu-accelerated partitioning algorithm for static timing analysis
Recent static timing analysis (STA) engines have leveraged task dependency graph (TDG)
parallelism to accelerate various STA algorithms, including graph-based analysis and path …
parallelism to accelerate various STA algorithms, including graph-based analysis and path …
Abcdplace: Accelerated batch-based concurrent detailed placement on multithreaded cpus and gpus
Placement is an important step in modern verylarge-scale integrated (VLSI) designs.
Detailed placement is a placement refining procedure intensively called throughout the …
Detailed placement is a placement refining procedure intensively called throughout the …
CEDR: A compiler-integrated, extensible DSSoC runtime
In this work, we present a C ompiler-integrated, E xtensible D omain Specific System on
Chip R untime (CEDR) ecosystem to facilitate research toward addressing the challenges of …
Chip R untime (CEDR) ecosystem to facilitate research toward addressing the challenges of …
Programming Dynamic Task Parallelism for Heterogeneous EDA Algorithms
Many EDA applications are extremely sparse, irregular, and control-flow intensive.
Parallelizing this type of application can benefit from the ability to express dynamic task …
Parallelizing this type of application can benefit from the ability to express dynamic task …