Architectural support for address translation on gpus: Designing memory management units for cpu/gpus with unified address spaces
B Pichai, L Hsu, A Bhattacharjee - ACM SIGARCH Computer Architecture …, 2014 - dl.acm.org
The proliferation of heterogeneous compute platforms, of which CPU/GPU is a prevalent
example, necessitates a manageable programming model to ensure widespread adoption …
example, necessitates a manageable programming model to ensure widespread adoption …
Memory interference characterization between CPU cores and integrated GPUs in mixed-criticality platforms
Most of today's mixed criticality platforms feature Systems on Chip (SoC) where a multi-core
CPU complex (the host) competes with an integrated Graphic Processor Unit (iGPU, the …
CPU complex (the host) competes with an integrated Graphic Processor Unit (iGPU, the …
Efficient GPU synchronization without scopes: Saying no to complex consistency models
As GPUs have become increasingly general purpose, applications with more general
sharing patterns and fine-grained synchronization have started to emerge. Unfortunately …
sharing patterns and fine-grained synchronization have started to emerge. Unfortunately …
A survey of architectural approaches for improving GPGPU performance, programmability and heterogeneity
With the skyrocketing advances of process technology, the increased need to process huge
amount of data, and the pivotal need for power efficiency, the usage of Graphics Processing …
amount of data, and the pivotal need for power efficiency, the usage of Graphics Processing …
Exploring memory consistency for massively-threaded throughput-oriented processors
BA Hechtman, DJ Sorin - Proceedings of the 40th Annual International …, 2013 - dl.acm.org
We re-visit the issue of hardware consistency models in the new context of massively-
threaded throughput-oriented processors (MTTOPs). A prominent example of an MTTOP is a …
threaded throughput-oriented processors (MTTOPs). A prominent example of an MTTOP is a …
Only buffer when you need to: Reducing on-chip gpu traffic with reconfigurable local atomic buffers
In recent years, due to their wide availability and ease of programming, GPUs have emerged
as the accelerator of choice for a wide variety of applications including graph analytics and …
as the accelerator of choice for a wide variety of applications including graph analytics and …
Code generation for embedded heterogeneous architectures on Android
The success of Android is based on its unified Java programming model that allows to write
platform-independent programs for a variety of different target platforms. However, this …
platform-independent programs for a variety of different target platforms. However, this …
An efficient sequential consistency implementation with dynamic race detection for GPUs
As GPUs are being used for general purpose computations, applications with different
memory access requirements have emerged. In spite of the growing demand, only few GPU …
memory access requirements have emerged. In spite of the growing demand, only few GPU …
Address translation for throughput-oriented accelerators
B Pichai, L Hsu, A Bhattacharjee - IEEE Micro, 2015 - ieeexplore.ieee.org
With processor vendors embracing hardware heterogeneity, providing low overhead
hardware and software abstractions to support easy-to-use programming models is a critical …
hardware and software abstractions to support easy-to-use programming models is a critical …
Fusion coherence: scalable cache coherence for heterogeneous kilo-core system
Future heterogeneous systems will integrate CPUs and GPUs on a single chip to achieve
high computing performance as well as high throughput. In general, it would discard the …
high computing performance as well as high throughput. In general, it would discard the …