Taskflow: A lightweight parallel and heterogeneous task graph computing system

TW Huang, DL Lin, CX Lin, Y Lin - IEEE Transactions on …, 2021 - ieeexplore.ieee.org
Taskflow aims to streamline the building of parallel and heterogeneous applications using a
lightweight task graph-based approach. Taskflow introduces an expressive task graph …

Gpu-accelerated static timing analysis

Z Guo, TW Huang, Y Lin - … of the 39th international conference on …, 2020 - dl.acm.org
The ever-increasing power of graphics processing units (GPUs) has opened new
opportunities for accelerating static timing analysis (STA) to a new milestone. Develo** a …

OpenTimer v2: A new parallel incremental timing analysis engine

TW Huang, G Guo, CX Lin… - IEEE transactions on …, 2020 - ieeexplore.ieee.org
Since the first release in 2015, OpenTimer v1 has been used in many industrial and
academic projects for analyzing the timing of custom designs. After four-year research and …

MAGICAL: Toward fully automated analog IC layout leveraging human and machine intelligence

B Xu, K Zhu, M Liu, Y Lin, S Li, X Tang… - 2019 IEEE/ACM …, 2019 - ieeexplore.ieee.org
Despite tremendous advancement of digital IC design automation tools over the last few
decades, analog IC layout is still heavily manual which is very tedious and error-prone. This …

Snicit: Accelerating sparse neural network inference via compression at inference time on gpu

S Jiang, TW Huang, B Yu, TY Ho - Proceedings of the 52nd International …, 2023 - dl.acm.org
Sparse deep neural network (DNN) has become an important technique for reducing the
inference cost of large DNNs. However, computing large sparse DNNs is very challenging …

From rtl to cuda: A gpu acceleration flow for rtl simulation with batch stimulus

DL Lin, H Ren, Y Zhang, B Khailany… - Proceedings of the 51st …, 2022 - dl.acm.org
High-throughput RTL simulation is critical for verifying today's highly complex SoCs. Recent
research has explored accelerating RTL simulation by leveraging event-driven approaches …

G-PASTA: Gpu-accelerated partitioning algorithm for static timing analysis

B Zhang, DL Lin, C Chang, CH Chiu, B Wang… - Proceedings of the 61st …, 2024 - dl.acm.org
Recent static timing analysis (STA) engines have leveraged task dependency graph (TDG)
parallelism to accelerate various STA algorithms, including graph-based analysis and path …

Abcdplace: Accelerated batch-based concurrent detailed placement on multithreaded cpus and gpus

Y Lin, W Li, J Gu, H Ren, B Khailany… - IEEE transactions on …, 2020 - ieeexplore.ieee.org
Placement is an important step in modern verylarge-scale integrated (VLSI) designs.
Detailed placement is a placement refining procedure intensively called throughout the …

CEDR: A compiler-integrated, extensible DSSoC runtime

J Mack, S Hassan, N Kumbhare… - ACM Transactions on …, 2023 - dl.acm.org
In this work, we present a C ompiler-integrated, E xtensible D omain Specific System on
Chip R untime (CEDR) ecosystem to facilitate research toward addressing the challenges of …

Programming Dynamic Task Parallelism for Heterogeneous EDA Algorithms

CH Chiu, DL Lin, TW Huang - 2023 IEEE/ACM International …, 2023 - ieeexplore.ieee.org
Many EDA applications are extremely sparse, irregular, and control-flow intensive.
Parallelizing this type of application can benefit from the ability to express dynamic task …