[HTML][HTML] A survey on hardware accelerators: Taxonomy, trends, challenges, and perspectives
In recent years, the limits of the multicore approach emerged in the so-called “dark silicon”
issue and diminishing returns of an ever-increasing core count. Hardware manufacturers …
issue and diminishing returns of an ever-increasing core count. Hardware manufacturers …
Transparent offloading and map** (TOM) enabling programmer-transparent near-data processing in GPU systems
Main memory bandwidth is a critical bottleneck for modern GPU systems due to limited off-
chip pin bandwidth. 3D-stacked memory architectures provide a promising opportunity to …
chip pin bandwidth. 3D-stacked memory architectures provide a promising opportunity to …
The gem5 simulator: Version 20.0+
The open-source and community-supported gem5 simulator is one of the most popular tools
for computer architecture research. This simulation infrastructure allows researchers to …
for computer architecture research. This simulation infrastructure allows researchers to …
Co-designing accelerators and SoC interfaces using gem5-Aladdin
Increasing demand for power-efficient, high-performance computing has spurred a growing
number and diversity of hardware accelerators in mobile and server Systems on Chip …
number and diversity of hardware accelerators in mobile and server Systems on Chip …
CoNDA: Efficient cache coherence support for near-data accelerators
Specialized on-chip accelerators are widely used to improve the energy efficiency of
computing systems. Recent advances in memory technology have enabled near-data …
computing systems. Recent advances in memory technology have enabled near-data …
Moesi-prime: preventing coherence-induced hammering in commodity workloads
Prior work shows that Rowhammer attacks---which flip bits in DRAM via frequent activations
of the same row (s)---are viable. Adversaries typically mount these attacks via instruction …
of the same row (s)---are viable. Adversaries typically mount these attacks via instruction …
Decoupled direct memory access: Isolating CPU and IO traffic by leveraging a dual-data-port DRAM
Memory channel contention is a critical performance bottleneck in modern systems that have
highly parallelized processing units operating on large data sets. The memory channel is …
highly parallelized processing units operating on large data sets. The memory channel is …
Understanding co-running behaviors on integrated CPU/GPU architectures
Architecture designers tend to integrate both CPUs and GPUs on the same chip to deliver
energy-efficient designs. It is still an open problem to effectively leverage the advantages of …
energy-efficient designs. It is still an open problem to effectively leverage the advantages of …
Amdahl's law in the context of heterogeneous many‐core systems–a survey
For over 50 years, Amdahl's Law has been the hallmark model for reasoning about
performance bounds for homogeneous parallel computing resources. As heterogeneous …
performance bounds for homogeneous parallel computing resources. As heterogeneous …
[КНИГА][B] General-purpose graphics processor architectures
Originally developed to support video games, graphics processor units (GPUs) are now
increasingly used for general-purpose (non-graphics) applications ranging from machine …
increasingly used for general-purpose (non-graphics) applications ranging from machine …