- Academic Search

M Khairy, Z Shen, TM Aamodt… - 2020 ACM/IEEE 47th …, 2020 - ieeexplore.ieee.org

In computer architecture, significant innovation frequently comes from industry. However, the
simulation tools used by industry are often not released for open use, and even when they …

Spara Citera Citerat av 303 Relaterade artiklar Alla 10 versionerna

[Free GPT-4]

[PDF] acm.org

PHI: Architectural support for synchronization-and bandwidth-efficient commutative scatter updates

A Mukkara, N Beckmann, D Sanchez - … of the 52nd Annual IEEE/ACM …, 2019 - dl.acm.org

Many applications perform frequent scatter update operations to large data structures. For
example, in push-style graph algorithms, processing each vertex requires updating the data …

Spara Citera Citerat av 83 Relaterade artiklar Alla 8 versionerna

[BOK][B] Shared-memory synchronization

ML Scott, T Brown - 2013 - Springer

This monograph grows out of nearly 40 years of experience in synchronization and
concurrent data structures. Though written primarily from the perspective of systems …

Spara Citera Citerat av 133 Relaterade artiklar Alla 6 versionerna Bibliotekssökning

[Free GPT-4]

[PDF] acm.org

A formal analysis of the NVIDIA PTX memory consistency model

D Lustig, S Sahasrabuddhe, O Giroux - Proceedings of the Twenty …, 2019 - dl.acm.org

This paper presents the first formal analysis of the official memory consistency model for the
NVIDIA PTX virtual ISA. Like other GPU memory models, the PTX memory model is weakly …

Spara Citera Citerat av 74 Relaterade artiklar

[Free GPT-4]

[PDF] acm.org

SEP-graph: finding shortest execution paths for graph processing under a hybrid framework on GPU

H Wang, L Geng, R Lee, K Hou, Y Zhang… - Proceedings of the 24th …, 2019 - dl.acm.org

In general, the performance of parallel graph processing is determined by three pairs of
critical parameters, namely synchronous or asynchronous execution mode (Sync or Async) …

Spara Citera Citerat av 61 Relaterade artiklar Alla 3 versionerna

[Free GPT-4]

[PDF] acm.org

Täkō: A polymorphic cache hierarchy for general-purpose optimization of data movement

BC Schwedock, P Yoovidhya, J Seibert… - Proceedings of the 49th …, 2022 - dl.acm.org

Current systems hide data movement from software behind the load-store interface.
Software's inability to observe and respond to data movement is the root cause of many …

Spara Citera Citerat av 25 Relaterade artiklar Alla 8 versionerna

[Free GPT-4]

[PDF] johnalsop.net

Spandex: A flexible interface for efficient heterogeneous coherence

J Alsop, M Sinclair, S Adve - 2018 ACM/IEEE 45th Annual …, 2018 - ieeexplore.ieee.org

Recent heterogeneous architectures have trended toward tighter integration and shared
memory largely due to the efficient communication and programmability enabled by this …

Spara Citera Citerat av 63 Relaterade artiklar Alla 7 versionerna

[Free GPT-4]

[PDF] wisc.edu

CPElide: Efficient Multi-Chiplet GPU Implicit Synchronization

P Dalmia, RS Kumar, MD Sinclair - 2024 57th IEEE/ACM …, 2024 - ieeexplore.ieee.org

Chiplets are transforming computer system designs, allowing system designers to combine
heterogeneous computing resources at unprecedented scales. Breaking larger, mono-lithic …

Spara Citera Citerat av 2 Relaterade artiklar Alla 7 versionerna

[Free GPT-4]

[PDF] acm.org Full View

Inter-kernel reuse-aware thread block scheduling

M Huzaifa, J Alsop, A Mahmoud, G Salvador… - ACM Transactions on …, 2020 - dl.acm.org

As GPUs have become more programmable, their performance and energy benefits have
made them increasingly popular. However, while GPU compute units continue to improve in …

Spara Citera Citerat av 24 Relaterade artiklar Alla 4 versionerna

[Free GPT-4]

[PDF] acm.org

Photon: A fine-grained sampled simulation methodology for GPU workloads

C Liu, Y Sun, TE Carlson - Proceedings of the 56th Annual IEEE/ACM …, 2023 - dl.acm.org

GPUs, due to their massively-parallel computing architectures, provide high performance for
data-parallel applications. However, existing GPU simulators are too slow to enable …

Spara Citera Citerat av 6 Relaterade artiklar Alla 6 versionerna

Skapa alarm

Citera

Avancerad sökning

Har sparats i Mitt bibliotek

Chasing away RAts: Semantics and evaluation for relaxed atomics on heterogeneous systems

Accel-Sim: An extensible simulation framework for validated GPU modeling

PHI: Architectural support for synchronization-and bandwidth-efficient commutative scatter updates

[BOK][B] Shared-memory synchronization

A formal analysis of the NVIDIA PTX memory consistency model

SEP-graph: finding shortest execution paths for graph processing under a hybrid framework on GPU

Täkō: A polymorphic cache hierarchy for general-purpose optimization of data movement

Spandex: A flexible interface for efficient heterogeneous coherence

CPElide: Efficient Multi-Chiplet GPU Implicit Synchronization

Inter-kernel reuse-aware thread block scheduling

Photon: A fine-grained sampled simulation methodology for GPU workloads