Hetero-mark, a benchmark suite for CPU-GPU collaborative computing
Graphics Processing Units (GPUs) can easily outperform CPUs in processing large-scale
data parallel workloads, but are considered weak in processing serialized tasks and …
data parallel workloads, but are considered weak in processing serialized tasks and …
Fluidic kernels: Cooperative execution of opencl programs on multiple heterogeneous devices
P Pandit, R Govindarajan - … IEEE/ACM International Symposium on Code …, 2014 - dl.acm.org
Programming heterogeneous computing systems with Graphics Processing Units (GPU) and
multi-core CPUs in them is complex and time-consuming. OpenCL has emerged as an …
multi-core CPUs in them is complex and time-consuming. OpenCL has emerged as an …
Design space exploration of on-chip ring interconnection for a CPU–GPU heterogeneous architecture
Incorporating a GPU architecture into CMP, which is more efficient with certain types of
applications, is a popular architecture trend in recent processors. This heterogeneous mix of …
applications, is a popular architecture trend in recent processors. This heterogeneous mix of …
Die-stacked memory device providing data translation
(57) ABSTRACT A die-stacked memory device incorporates a data translation controller at
one or more logic dies of the device to provide data translation services for data to be stored …
one or more logic dies of the device to provide data translation services for data to be stored …
A survey of techniques for managing and leveraging caches in GPUs
S Mittal - Journal of Circuits, Systems, and Computers, 2014 - World Scientific
Initially introduced as special-purpose accelerators for graphics applications, graphics
processing units (GPUs) have now emerged as general purpose computing platforms for a …
processing units (GPUs) have now emerged as general purpose computing platforms for a …
In-cache query co-processing on coupled CPU-GPU architectures
Recently, there have been some emerging processor designs that the CPU and the GPU
(Graphics Processing Unit) are integrated in a single chip and share Last Level Cache …
(Graphics Processing Unit) are integrated in a single chip and share Last Level Cache …
Stacked memory device with metadata management
(65) Prior Publication Data Primary Examiner—Sam Rizk US 2014/004O698 A1 Feb. 6,
2014(57) ABSTRACT (51) Int. Cl A processing system comprises one or more processor ioM …
2014(57) ABSTRACT (51) Int. Cl A processing system comprises one or more processor ioM …
Die-stacked memory device with reconfigurable logic
A die-stacked memory device incorporates a reconfigurable logic device to provide
implementation flexibility in performing various data manipulation operations and other …
implementation flexibility in performing various data manipulation operations and other …
Warped-preexecution: A GPU pre-execution approach for improving latency hiding
This paper presents a pre-execution approach for improving GPU performance, called P-
mode (pre-execution mode). GPUs utilize a number of concurrent threads for hiding …
mode (pre-execution mode). GPUs utilize a number of concurrent threads for hiding …
Spare register aware prefetching for graph algorithms on GPUs
More and more graph algorithms are being GPU enabled. Graph algorithm implementations
on GPUs have irregular control flow and are memory-intensive with many irregular/data …
on GPUs have irregular control flow and are memory-intensive with many irregular/data …