OWL: cooperative thread array aware scheduling techniques for improving GPGPU performance
Emerging GPGPU architectures, along with programming models like CUDA and OpenCL,
offer a cost-effective platform for many applications by providing high thread level …
offer a cost-effective platform for many applications by providing high thread level …
Neither more nor less: Optimizing thread-level parallelism for GPGPUs
General-purpose graphics processing units (GPG-PUs) are at their best in accelerating
computation by exploiting abundant thread-level parallelism (TLP) offered by many classes …
computation by exploiting abundant thread-level parallelism (TLP) offered by many classes …
Scale-out processors
Scale-out datacenters mandate high per-server throughput to get the maximum benefit from
the large TCO investment. Emerging applications (eg, data serving and web search) that run …
the large TCO investment. Emerging applications (eg, data serving and web search) that run …
Managing GPU concurrency in heterogeneous architectures
Heterogeneous architectures consisting of general-purpose CPUs and throughput-
optimized GPUs are projected to be the dominant computing platforms for many classes of …
optimized GPUs are projected to be the dominant computing platforms for many classes of …
Adapt-noc: A flexible network-on-chip design for heterogeneous manycore architectures
The increased computational capability in heterogeneous manycore architectures facilitates
the concurrent execution of many applications. This requires, among other things, a flexible …
the concurrent execution of many applications. This requires, among other things, a flexible …
On-chip communication network for efficient training of deep convolutional networks on heterogeneous manycore systems
Convolutional Neural Networks (CNNs) have shown a great deal of success in diverse
application domains including computer vision, speech recognition, and natural language …
application domains including computer vision, speech recognition, and natural language …
Learning-based application-agnostic 3D NoC design for heterogeneous manycore systems
The rising use of deep learning and other big-data algorithms has led to an increasing
demand for hardware platforms that are computationally powerful, yet energy-efficient. Due …
demand for hardware platforms that are computationally powerful, yet energy-efficient. Due …
NoC architectures for silicon interposer systems: Why pay for more wires when you can get them (from your interposer) for free?
Silicon interposer technology (" 2.5 D" stacking) enables the integration of multiple memory
stacks with a processor chip, thereby greatly increasing in-package memory capacity while …
stacks with a processor chip, thereby greatly increasing in-package memory capacity while …
Design space exploration of on-chip ring interconnection for a CPU–GPU heterogeneous architecture
Incorporating a GPU architecture into CMP, which is more efficient with certain types of
applications, is a popular architecture trend in recent processors. This heterogeneous mix of …
applications, is a popular architecture trend in recent processors. This heterogeneous mix of …
HNOCS: modular open-source simulator for heterogeneous NoCs
We present HNOCS (Heterogeneous Network-on-Chip Simulator), an open-source NoC
simulator based on OMNeT++. To the best of our knowledge, HNOCS is the first simulator to …
simulator based on OMNeT++. To the best of our knowledge, HNOCS is the first simulator to …