OWL: cooperative thread array aware scheduling techniques for improving GPGPU performance

A Jog, O Kayiran, N Chidambaram Nachiappan… - ACM SIGPLAN …, 2013 - dl.acm.org
Emerging GPGPU architectures, along with programming models like CUDA and OpenCL,
offer a cost-effective platform for many applications by providing high thread level …

Neither more nor less: Optimizing thread-level parallelism for GPGPUs

O Kayıran, A Jog, MT Kandemir… - Proceedings of the 22nd …, 2013 - ieeexplore.ieee.org
General-purpose graphics processing units (GPG-PUs) are at their best in accelerating
computation by exploiting abundant thread-level parallelism (TLP) offered by many classes …

Scale-out processors

P Lotfi-Kamran, B Grot, M Ferdman, S Volos… - ACM SIGARCH …, 2012 - dl.acm.org
Scale-out datacenters mandate high per-server throughput to get the maximum benefit from
the large TCO investment. Emerging applications (eg, data serving and web search) that run …

Managing GPU concurrency in heterogeneous architectures

O Kayiran, NC Nachiappan, A Jog… - 2014 47th annual …, 2014 - ieeexplore.ieee.org
Heterogeneous architectures consisting of general-purpose CPUs and throughput-
optimized GPUs are projected to be the dominant computing platforms for many classes of …

Adapt-noc: A flexible network-on-chip design for heterogeneous manycore architectures

H Zheng, K Wang, A Louri - 2021 IEEE international symposium …, 2021 - ieeexplore.ieee.org
The increased computational capability in heterogeneous manycore architectures facilitates
the concurrent execution of many applications. This requires, among other things, a flexible …

On-chip communication network for efficient training of deep convolutional networks on heterogeneous manycore systems

W Choi, K Duraisamy, RG Kim… - IEEE Transactions …, 2017 - ieeexplore.ieee.org
Convolutional Neural Networks (CNNs) have shown a great deal of success in diverse
application domains including computer vision, speech recognition, and natural language …

Learning-based application-agnostic 3D NoC design for heterogeneous manycore systems

BK Joardar, RG Kim, JR Doppa… - IEEE Transactions …, 2018 - ieeexplore.ieee.org
The rising use of deep learning and other big-data algorithms has led to an increasing
demand for hardware platforms that are computationally powerful, yet energy-efficient. Due …

NoC architectures for silicon interposer systems: Why pay for more wires when you can get them (from your interposer) for free?

NE Jerger, A Kannan, Z Li… - 2014 47th Annual IEEE …, 2014 - ieeexplore.ieee.org
Silicon interposer technology (" 2.5 D" stacking) enables the integration of multiple memory
stacks with a processor chip, thereby greatly increasing in-package memory capacity while …

Design space exploration of on-chip ring interconnection for a CPU–GPU heterogeneous architecture

J Lee, S Li, H Kim, S Yalamanchili - Journal of Parallel and Distributed …, 2013 - Elsevier
Incorporating a GPU architecture into CMP, which is more efficient with certain types of
applications, is a popular architecture trend in recent processors. This heterogeneous mix of …

HNOCS: modular open-source simulator for heterogeneous NoCs

Y Ben-Itzhak, E Zahavi, I Cidon… - … on embedded computer …, 2012 - ieeexplore.ieee.org
We present HNOCS (Heterogeneous Network-on-Chip Simulator), an open-source NoC
simulator based on OMNeT++. To the best of our knowledge, HNOCS is the first simulator to …