- Academic Search

A Boroumand, S Ghose, M Patel, H Hassan… - Proceedings of the 46th …, 2019 - dl.acm.org

Specialized on-chip accelerators are widely used to improve the energy efficiency of
computing systems. Recent advances in memory technology have enabled near-data …

Spara Citera Citerat av 141 Relaterade artiklar Alla 23 versionerna

[Free GPT-4]

[PDF] upc.edu

Beyond the socket: NUMA-aware GPUs

U Milic, O Villa, E Bolotin, A Arunkumar… - Proceedings of the 50th …, 2017 - dl.acm.org

GPUs achieve high throughput and power efficiency by employing many small single
instruction multiple thread (SIMT) cores. To minimize scheduling logic and performance …

Spara Citera Citerat av 98 Relaterade artiklar Alla 7 versionerna

[Free GPT-4]

[PDF] gatech.edu

Combining HW/SW mechanisms to improve NUMA performance of multi-GPU systems

V Young, A Jaleel, E Bolotin, E Ebrahimi… - 2018 51st Annual …, 2018 - ieeexplore.ieee.org

Historically, improvement in GPU performance has been tightly coupled with transistor
scaling. As Moore's Law slows down, performance of single GPUs may ultimately plateau …

Spara Citera Citerat av 90 Relaterade artiklar Alla 4 versionerna

[Free GPT-4]

[PDF] acm.org

A formal analysis of the NVIDIA PTX memory consistency model

D Lustig, S Sahasrabuddhe, O Giroux - Proceedings of the Twenty …, 2019 - dl.acm.org

This paper presents the first formal analysis of the official memory consistency model for the
NVIDIA PTX virtual ISA. Like other GPU memory models, the PTX memory model is weakly …

Spara Citera Citerat av 74 Relaterade artiklar

[Free GPT-4]

[PDF] acm.org

Chronos: Efficient speculative parallelism for accelerators

M Abeydeera, D Sanchez - Proceedings of the Twenty-Fifth International …, 2020 - dl.acm.org

We present Chronos, a framework to build accelerators for applications with speculative
parallelism. These applications consist of atomic tasks, sometimes with order constraints …

Spara Citera Citerat av 52 Relaterade artiklar Alla 8 versionerna

Demystifying bert: System design implications

S Pati, S Aga, N Jayasena… - 2022 IEEE International …, 2022 - ieeexplore.ieee.org

Transfer learning in natural language processing (NLP) uses increasingly large models that
tackle challenging problems. Consequently, these applications are driving the requirements …

Spara Citera Citerat av 29 Relaterade artiklar Alla 2 versionerna

[Free GPT-4]

[PDF] johnalsop.net

Spandex: A flexible interface for efficient heterogeneous coherence

J Alsop, M Sinclair, S Adve - 2018 ACM/IEEE 45th Annual …, 2018 - ieeexplore.ieee.org

Recent heterogeneous architectures have trended toward tighter integration and shared
memory largely due to the efficient communication and programmability enabled by this …

Spara Citera Citerat av 63 Relaterade artiklar Alla 7 versionerna

[Free GPT-4]

[PDF] utexas.edu

Selective GPU caches to eliminate CPU-GPU HW cache coherence

N Agarwal, D Nellans, E Ebrahimi… - … Symposium on High …, 2016 - ieeexplore.ieee.org

Cache coherence is ubiquitous in shared memory multiprocessors because it provides a
simple, high performance memory abstraction to programmers. Recent work suggests …

Spara Citera Citerat av 73 Relaterade artiklar Alla 5 versionerna

[Free GPT-4]

[PDF] github.io

Hmg: Extending cache coherence protocols across modern hierarchical multi-gpu systems

X Ren, D Lustig, E Bolotin, A Jaleel… - … Symposium on High …, 2020 - ieeexplore.ieee.org

Prior work on GPU cache coherence has shown that simple hardware-or software-based
protocols can be more than sufficient. However, in recent years, features such as multi-chip …

Spara Citera Citerat av 43 Relaterade artiklar Alla 3 versionerna

[Free GPT-4]

[PDF] acm.org

Chasing away RAts: Semantics and evaluation for relaxed atomics on heterogeneous systems

MD Sinclair, J Alsop, SV Adve - Proceedings of the 44th Annual …, 2017 - dl.acm.org

An unambiguous and easy-to-understand memory consistency model is crucial for ensuring
correct synchronization and guiding future design of heterogeneous systems. In a widely …

Spara Citera Citerat av 60 Relaterade artiklar Alla 19 versionerna

Skapa alarm

Citera

Avancerad sökning

Har sparats i Mitt bibliotek

Efficient GPU synchronization without scopes: Saying no to complex consistency models

CoNDA: Efficient cache coherence support for near-data accelerators

Beyond the socket: NUMA-aware GPUs

Combining HW/SW mechanisms to improve NUMA performance of multi-GPU systems

A formal analysis of the NVIDIA PTX memory consistency model

Chronos: Efficient speculative parallelism for accelerators

Demystifying bert: System design implications

Spandex: A flexible interface for efficient heterogeneous coherence

Selective GPU caches to eliminate CPU-GPU HW cache coherence

Hmg: Extending cache coherence protocols across modern hierarchical multi-gpu systems

Chasing away RAts: Semantics and evaluation for relaxed atomics on heterogeneous systems