Review of chiplet-based design: system architecture and interconnection

Y Liu, X Li, S Yin - Science China Information Sciences, 2024 - Springer
Chiplet-based design, which breaks a system into multiple smaller dice (or “chiplets”) and
reassembles them into a new system chip through advanced packaging, has received …

Adapt-noc: A flexible network-on-chip design for heterogeneous manycore architectures

H Zheng, K Wang, A Louri - 2021 IEEE international symposium …, 2021 - ieeexplore.ieee.org
The increased computational capability in heterogeneous manycore architectures facilitates
the concurrent execution of many applications. This requires, among other things, a flexible …

On-chip communication network for efficient training of deep convolutional networks on heterogeneous manycore systems

W Choi, K Duraisamy, RG Kim… - IEEE Transactions …, 2017 - ieeexplore.ieee.org
Convolutional Neural Networks (CNNs) have shown a great deal of success in diverse
application domains including computer vision, speech recognition, and natural language …

Learning-based application-agnostic 3D NoC design for heterogeneous manycore systems

BK Joardar, RG Kim, JR Doppa… - IEEE Transactions …, 2018 - ieeexplore.ieee.org
The rising use of deep learning and other big-data algorithms has led to an increasing
demand for hardware platforms that are computationally powerful, yet energy-efficient. Due …

A versatile and flexible chiplet-based system design for heterogeneous manycore architectures

H Zheng, K Wang, A Louri - 2020 57th ACM/IEEE Design …, 2020 - ieeexplore.ieee.org
Heterogeneous manycore architectures are deployed to simultaneously run multiple and
diverse applications. This requires various computing capabilities (CPUs, GPUs, and …

Opportunistic computing in gpu architectures

A Pattnaik, X Tang, O Kayiran, A Jog, A Mishra… - Proceedings of the 46th …, 2019 - dl.acm.org
Data transfer overhead between computing cores and memory hierarchy has been a
persistent issue for von Neumann architectures and the problem has only become more …

Morpheus: Extending the last level cache capacity in GPU systems using idle GPU core resources

S Darabi, M Sadrosadati, N Akbarzadeh… - 2022 55th IEEE/ACM …, 2022 - ieeexplore.ieee.org
Graphics Processing Units (GPUs) are widely-used accelerators for data-parallel
applications. In many GPU applications, GPU memory bandwidth bottlenecks performance …

OSCAR: Orchestrating STT-RAM cache traffic for heterogeneous CPU-GPU architectures

J Zhan, O Kayıran, GH Loh, CR Das… - 2016 49th annual IEEE …, 2016 - ieeexplore.ieee.org
As we integrate data-parallel GPUs with general-purpose CPUs on a single chip, the
enormous cache traffic generated by GPUs will not only exhaust the limited cache capacity …

LTRF: Enabling high-capacity register files for GPUs via hardware/software cooperative register prefetching

M Sadrosadati, A Mirhosseini, SB Ehsani… - ACM SIGPLAN …, 2018 - dl.acm.org
Graphics Processing Units (GPUs) employ large register files to accommodate all active
threads and accelerate context switching. Unfortunately, register files are a scalability …

A survey of architectural approaches for improving GPGPU performance, programmability and heterogeneity

M Khairy, AG Wassal, M Zahran - Journal of Parallel and Distributed …, 2019 - Elsevier
With the skyrocketing advances of process technology, the increased need to process huge
amount of data, and the pivotal need for power efficiency, the usage of Graphics Processing …