[HTML][HTML] A survey on hardware accelerators: Taxonomy, trends, challenges, and perspectives

B Peccerillo, M Mannino, A Mondelli… - Journal of Systems …, 2022 - Elsevier
In recent years, the limits of the multicore approach emerged in the so-called “dark silicon”
issue and diminishing returns of an ever-increasing core count. Hardware manufacturers …

Transparent offloading and map** (TOM) enabling programmer-transparent near-data processing in GPU systems

K Hsieh, E Ebrahimi, G Kim, N Chatterjee… - ACM SIGARCH …, 2016 - dl.acm.org
Main memory bandwidth is a critical bottleneck for modern GPU systems due to limited off-
chip pin bandwidth. 3D-stacked memory architectures provide a promising opportunity to …

The gem5 simulator: Version 20.0+

J Lowe-Power, AM Ahmad, A Akram, M Alian… - arxiv preprint arxiv …, 2020 - arxiv.org
The open-source and community-supported gem5 simulator is one of the most popular tools
for computer architecture research. This simulation infrastructure allows researchers to …

Co-designing accelerators and SoC interfaces using gem5-Aladdin

YS Shao, SL **, V Srinivasan, GY Wei… - 2016 49th Annual …, 2016 - ieeexplore.ieee.org
Increasing demand for power-efficient, high-performance computing has spurred a growing
number and diversity of hardware accelerators in mobile and server Systems on Chip …

CoNDA: Efficient cache coherence support for near-data accelerators

A Boroumand, S Ghose, M Patel, H Hassan… - Proceedings of the 46th …, 2019 - dl.acm.org
Specialized on-chip accelerators are widely used to improve the energy efficiency of
computing systems. Recent advances in memory technology have enabled near-data …

Moesi-prime: preventing coherence-induced hammering in commodity workloads

K Loughlin, S Saroiu, A Wolman, YA Manerkar… - Proceedings of the 49th …, 2022 - dl.acm.org
Prior work shows that Rowhammer attacks---which flip bits in DRAM via frequent activations
of the same row (s)---are viable. Adversaries typically mount these attacks via instruction …

Decoupled direct memory access: Isolating CPU and IO traffic by leveraging a dual-data-port DRAM

D Lee, L Subramanian… - 2015 International …, 2015 - ieeexplore.ieee.org
Memory channel contention is a critical performance bottleneck in modern systems that have
highly parallelized processing units operating on large data sets. The memory channel is …

Understanding co-running behaviors on integrated CPU/GPU architectures

F Zhang, J Zhai, B He, S Zhang… - IEEE Transactions on …, 2016 - ieeexplore.ieee.org
Architecture designers tend to integrate both CPUs and GPUs on the same chip to deliver
energy-efficient designs. It is still an open problem to effectively leverage the advantages of …

Amdahl's law in the context of heterogeneous many‐core systems–a survey

MAN Al‐hayanni, F **a, A Rafiev… - IET Computers & …, 2020 - Wiley Online Library
For over 50 years, Amdahl's Law has been the hallmark model for reasoning about
performance bounds for homogeneous parallel computing resources. As heterogeneous …

[КНИГА][B] General-purpose graphics processor architectures

Originally developed to support video games, graphics processor units (GPUs) are now
increasingly used for general-purpose (non-graphics) applications ranging from machine …