RowClone: Fast and energy-efficient in-DRAM bulk data copy and initialization

V Seshadri, Y Kim, C Fallin, D Lee… - Proceedings of the 46th …, 2013‏ - dl.acm.org
Several system-level operations trigger bulk data copy or initialization. Even though these
bulk data operations do not require any computation, current systems transfer a large …

A survey on agent-based simulation using hardware accelerators

J ** of data parallel programs to opencl for heterogeneous systems
D Grewe, Z Wang, MFP O'Boyle - Proceedings of the 2013 …, 2013‏ - ieeexplore.ieee.org
General purpose GPU based systems are highly attractive as they give potentially massive
performance at little cost. Re-alizing such potential is challenging due to the complexity of …

Kernelet: High-throughput GPU kernel executions with dynamic slicing and scheduling

J Zhong, B He - IEEE Transactions on Parallel and Distributed …, 2013‏ - ieeexplore.ieee.org
Graphics processors, or GPUs, have recently been widely used as accelerators in shared
environments such as clusters and clouds. In such shared environments, many kernels are …

Accelerating applications using edge tensor processing units

KC Hsu, HW Tseng - Proceedings of the International Conference for …, 2021‏ - dl.acm.org
Neural network (NN) accelerators have been integrated into a wide-spectrum of computer
systems to accommodate the rapidly growing demands for artificial intelligence (AI) and …

Characterizing and evaluating a key-value store application on heterogeneous CPU-GPU systems

TH Hetherington, TG Rogers, L Hsu… - … Analysis of Systems …, 2012‏ - ieeexplore.ieee.org
The recent use of graphics processing units (GPUs) in several top supercomputers
demonstrate their ability to consistently deliver positive results in high-performance …