Vortex: Extending the RISC-V ISA for GPGPU and 3D-graphics
The importance of open-source hardware and software has been increasing. However,
despite GPUs being one of the more popular accelerators across various applications, there …
despite GPUs being one of the more popular accelerators across various applications, there …
KRISP: Enabling kernel-wise right-sizing for spatial partitioned gpu inference servers
Machine learning (ML) inference workloads present significantly different challenges than
ML training workloads. Typically, inference workloads are shorter running and under-utilize …
ML training workloads. Typically, inference workloads are shorter running and under-utilize …
[PDF][PDF] Further Closing the GAP: Improving the Accuracy of gem5's GPU Models
V Ramadas, D Kouchekinia… - 6th Young Architects' …, 2024 - pages.cs.wisc.edu
The breakdown in Moore's Law and Dennard Scaling is leading to drastic changes in the
makeup and constitution of computing systems. For example, a single chip integrates 10 …
makeup and constitution of computing systems. For example, a single chip integrates 10 …
Global Optimizations & Lightweight Dynamic Logic for Concurrency
Modern accelerators like GPUs are increasingly executing independent operations
concurrently to improve the device's compute utilization. However, effectively harnessing it …
concurrently to improve the device's compute utilization. However, effectively harnessing it …
[PDF][PDF] Simulating Machine Learning Models at Scale
V Ramadas, MD Sinclair - SRC TECHCON, 2024 - pages.cs.wisc.edu
In recent years deep neural networks (DNNs) have emerged as an important application
domain driving the requirements for future systems. As DNNs get more sophisticated, their …
domain driving the requirements for future systems. As DNNs get more sophisticated, their …
T3: Transparent Tracking & Triggering for Fine-grained Overlap of Compute & Collectives
Large Language Models increasingly rely on distributed techniques for their training and
inference. These techniques require communication across devices which can reduce …
inference. These techniques require communication across devices which can reduce …
Performance analysis and modeling for quantum computing simulation on distributed GPU platforms
Quantum computing holds great promise for accelerating computational tasks, but they are
still not accessible. To fill this gap, quantum computing simulators have been widely used for …
still not accessible. To fill this gap, quantum computing simulators have been widely used for …
[PDF][PDF] Simulation Support for Fast and Accurate Large-Scale GPGPU & Accelerator Workloads
In recent years deep neural networks (DNNs) have emerged as an important application
domain driving the requirements for future systems. As DNNs get more sophisticated, their …
domain driving the requirements for future systems. As DNNs get more sophisticated, their …
Adding MFMA Support to gem5
M Kurzynski, MD Sinclair - arxiv preprint arxiv:2501.18113, 2025 - arxiv.org
In this work we have enhanced gem5's GPU model support to add Matrix Core Engines
(MCEs). Specifically, on the AMD MI200 and MI300 GPUs that gem5 supports, these MCEs …
(MCEs). Specifically, on the AMD MI200 and MI300 GPUs that gem5 supports, these MCEs …
[PDF][PDF] gem5 GPU accuracy profiler (GAP)
C Jamieson, A Chandrashekar… - Proc 4 th gem5 …, 2022 - pages.cs.wisc.edu
In recent years, we have been enhancing and updating gem5's GPU support [1]. First, we
have enhanced gem5's GPU support for ML workloads such that gem5 can now run [2] …
have enhanced gem5's GPU support for ML workloads such that gem5 can now run [2] …