GhOST: a GPU out-of-order scheduling technique for stall reduction
Graphics Processing Units (GPUs) use massive multi-threading coupled with static
scheduling to hide instruction latencies. Despite this, memory instructions pose a challenge …
scheduling to hide instruction latencies. Despite this, memory instructions pose a challenge …
GCoM: a detailed GPU core model for accurate analytical modeling of modern GPUs
Analytical models can greatly help computer architects perform orders of magnitude faster
early-stage design space exploration than using cycle-level simulators. To facilitate rapid …
early-stage design space exploration than using cycle-level simulators. To facilitate rapid …
Specializing coherence, consistency, and push/pull for gpu graph analytics
G Salvador, WH Darvin, M Huzaifa… - … Analysis of Systems …, 2020 - ieeexplore.ieee.org
This work explores the interaction of three communication-centric design dimensions for
graph workloads on emerging integrated CPU-GPU systems: update propagation with and …
graph workloads on emerging integrated CPU-GPU systems: update propagation with and …
GPU domain specialization via composable on-package architecture
As GPUs scale their low-precision matrix math throughput to boost deep learning (DL)
performance, they upset the balance between math throughput and memory system …
performance, they upset the balance between math throughput and memory system …
Evaluating and mitigating bandwidth bottlenecks across the memory hierarchy in GPUs
GPUs are often limited by off-chip memory bandwidth. With the advent of general-purpose
computing on GPUs, a cache hierarchy has been introduced to filter the bandwidth demand …
computing on GPUs, a cache hierarchy has been introduced to filter the bandwidth demand …
HyFiSS: A Hybrid Fidelity Stall-Aware Simulator for GPGPUs
J Yang, M Wen, D Chen, Z Chen, Z Xue… - 2024 57th IEEE/ACM …, 2024 - ieeexplore.ieee.org
The widespread adoption of GPUs has driven the development of GPU simulators, which, in
turn, lead advancements in both GPU architectures and software optimization. Trace-driven …
turn, lead advancements in both GPU architectures and software optimization. Trace-driven …
GCStack: A GPU Cycle Accounting Mechanism for Providing Accurate Insight into GPU Performance
Cycles Per Instruction (CPI) stacks help computer architects gain insight into the
performance of their target architectures and applications. To bring the benefits of CPI stacks …
performance of their target architectures and applications. To bring the benefits of CPI stacks …
Towards controlled single-molecule manipulation using “real-time” molecular dynamics simulation: A GPU implementation
Molecular electronics saw its birth with the idea to build electronic circuitry with single
molecules as individual components. Even though commercial applications are still modest …
molecules as individual components. Even though commercial applications are still modest …
Efficient coherence and consistency for specialized memory hierarchies
MD Sinclair - 2017 - ideals.illinois.edu
As the benefits from transistor scaling slow down, specialization is becoming increasingly
important for a wide range of applications. Although traditional heterogeneous systems work …
important for a wide range of applications. Although traditional heterogeneous systems work …
Improving Fetch and Issue Bandwidth in the Vortex GPU
LM Aurud - 2023 - ntnuopen.ntnu.no
Softwaresimulering er en mye brukt metode for å forske på datamaskin arkitekturer.
Dessverre er det tregt, spesielt for større parallelle arkitekturer, som GPUer. En detaljert …
Dessverre er det tregt, spesielt for større parallelle arkitekturer, som GPUer. En detaljert …