GhOST: a GPU out-of-order scheduling technique for stall reduction

I Chaturvedi, BR Godala, Y Wu, Z Xu… - 2024 ACM/IEEE 51st …, 2024 - ieeexplore.ieee.org
Graphics Processing Units (GPUs) use massive multi-threading coupled with static
scheduling to hide instruction latencies. Despite this, memory instructions pose a challenge …

GCoM: a detailed GPU core model for accurate analytical modeling of modern GPUs

J Lee, Y Ha, S Lee, J Woo, J Lee, H Jang… - Proceedings of the 49th …, 2022 - dl.acm.org
Analytical models can greatly help computer architects perform orders of magnitude faster
early-stage design space exploration than using cycle-level simulators. To facilitate rapid …

Specializing coherence, consistency, and push/pull for gpu graph analytics

G Salvador, WH Darvin, M Huzaifa… - … Analysis of Systems …, 2020 - ieeexplore.ieee.org
This work explores the interaction of three communication-centric design dimensions for
graph workloads on emerging integrated CPU-GPU systems: update propagation with and …

GPU domain specialization via composable on-package architecture

Y Fu, E Bolotin, N Chatterjee, D Nellans… - ACM Transactions on …, 2021 - dl.acm.org
As GPUs scale their low-precision matrix math throughput to boost deep learning (DL)
performance, they upset the balance between math throughput and memory system …

Evaluating and mitigating bandwidth bottlenecks across the memory hierarchy in GPUs

S Dublish, V Nagarajan… - 2017 IEEE International …, 2017 - ieeexplore.ieee.org
GPUs are often limited by off-chip memory bandwidth. With the advent of general-purpose
computing on GPUs, a cache hierarchy has been introduced to filter the bandwidth demand …

HyFiSS: A Hybrid Fidelity Stall-Aware Simulator for GPGPUs

J Yang, M Wen, D Chen, Z Chen, Z Xue… - 2024 57th IEEE/ACM …, 2024 - ieeexplore.ieee.org
The widespread adoption of GPUs has driven the development of GPU simulators, which, in
turn, lead advancements in both GPU architectures and software optimization. Trace-driven …

GCStack: A GPU Cycle Accounting Mechanism for Providing Accurate Insight into GPU Performance

H Cha, S Lee, Y Ha, H Jang, J Kim… - IEEE Computer …, 2024 - ieeexplore.ieee.org
Cycles Per Instruction (CPI) stacks help computer architects gain insight into the
performance of their target architectures and applications. To bring the benefits of CPI stacks …

Towards controlled single-molecule manipulation using “real-time” molecular dynamics simulation: A GPU implementation

D Van Vreumingen, S Tewari, F Verbeek… - Micromachines, 2018 - mdpi.com
Molecular electronics saw its birth with the idea to build electronic circuitry with single
molecules as individual components. Even though commercial applications are still modest …

Efficient coherence and consistency for specialized memory hierarchies

MD Sinclair - 2017 - ideals.illinois.edu
As the benefits from transistor scaling slow down, specialization is becoming increasingly
important for a wide range of applications. Although traditional heterogeneous systems work …

Improving Fetch and Issue Bandwidth in the Vortex GPU

LM Aurud - 2023 - ntnuopen.ntnu.no
Softwaresimulering er en mye brukt metode for å forske på datamaskin arkitekturer.
Dessverre er det tregt, spesielt for større parallelle arkitekturer, som GPUer. En detaljert …