Vortex: Extending the RISC-V ISA for GPGPU and 3D-graphics

B Tine, KP Yalamarthy, F Elsabbagh… - MICRO-54: 54th Annual …, 2021 - dl.acm.org
The importance of open-source hardware and software has been increasing. However,
despite GPUs being one of the more popular accelerators across various applications, there …

KRISP: Enabling kernel-wise right-sizing for spatial partitioned gpu inference servers

M Chow, A Jahanshahi, D Wong - 2023 IEEE International …, 2023 - ieeexplore.ieee.org
Machine learning (ML) inference workloads present significantly different challenges than
ML training workloads. Typically, inference workloads are shorter running and under-utilize …

[PDF][PDF] Further Closing the GAP: Improving the Accuracy of gem5's GPU Models

V Ramadas, D Kouchekinia… - 6th Young Architects' …, 2024 - pages.cs.wisc.edu
The breakdown in Moore's Law and Dennard Scaling is leading to drastic changes in the
makeup and constitution of computing systems. For example, a single chip integrates 10 …

Global Optimizations & Lightweight Dynamic Logic for Concurrency

S Pati, S Aga, N Jayasena, MD Sinclair - arxiv preprint arxiv:2409.02227, 2024 - arxiv.org
Modern accelerators like GPUs are increasingly executing independent operations
concurrently to improve the device's compute utilization. However, effectively harnessing it …

[PDF][PDF] Simulating Machine Learning Models at Scale

V Ramadas, MD Sinclair - SRC TECHCON, 2024 - pages.cs.wisc.edu
In recent years deep neural networks (DNNs) have emerged as an important application
domain driving the requirements for future systems. As DNNs get more sophisticated, their …

T3: Transparent Tracking & Triggering for Fine-grained Overlap of Compute & Collectives

S Pati, S Aga, M Islam, N Jayasena… - Proceedings of the 29th …, 2024 - dl.acm.org
Large Language Models increasingly rely on distributed techniques for their training and
inference. These techniques require communication across devices which can reduce …

Performance analysis and modeling for quantum computing simulation on distributed GPU platforms

A Ahmadzadeh, H Sarbazi-Azad - Quantum Information Processing, 2024 - Springer
Quantum computing holds great promise for accelerating computational tasks, but they are
still not accessible. To fill this gap, quantum computing simulators have been widely used for …

[PDF][PDF] Simulation Support for Fast and Accurate Large-Scale GPGPU & Accelerator Workloads

V Ramadas, M Poremba, B Beckmann… - Third Workshop on …, 2024 - pages.cs.wisc.edu
In recent years deep neural networks (DNNs) have emerged as an important application
domain driving the requirements for future systems. As DNNs get more sophisticated, their …

Adding MFMA Support to gem5

M Kurzynski, MD Sinclair - arxiv preprint arxiv:2501.18113, 2025 - arxiv.org
In this work we have enhanced gem5's GPU model support to add Matrix Core Engines
(MCEs). Specifically, on the AMD MI200 and MI300 GPUs that gem5 supports, these MCEs …

[PDF][PDF] gem5 GPU accuracy profiler (GAP)

C Jamieson, A Chandrashekar… - Proc 4 th gem5 …, 2022 - pages.cs.wisc.edu
In recent years, we have been enhancing and updating gem5's GPU support [1]. First, we
have enhanced gem5's GPU support for ML workloads such that gem5 can now run [2] …