Accel-sim: An extensible simulation framework for validated gpu modeling

M Khairy, Z Shen, TM Aamodt… - 2020 ACM/IEEE 47th …, 2020 - ieeexplore.ieee.org
In computer architecture, significant innovation frequently comes from industry. However, the
simulation tools used by industry are often not released for open use, and even when they …

Llmcompass: Enabling efficient hardware design for large language model inference

H Zhang, A Ning, RB Prabhakar… - 2024 ACM/IEEE 51st …, 2024 - ieeexplore.ieee.org
The past year has witnessed the increasing popularity of Large Language Models (LLMs).
Their unprecedented scale and associated high hardware cost have impeded their broader …

Ferroelectric ternary content addressable memories for energy-efficient associative search

X Yin, Y Qian, M Imani, K Ni, C Li… - … on Computer-Aided …, 2022 - ieeexplore.ieee.org
A fast and efficient search function across the database has been a core component for a
number of data-intensive tasks in machine learning, IoT applications, and inference …

Need for speed: Experiences building a trustworthy system-level gpu simulator

O Villa, D Lustig, Z Yan, E Bolotin, Y Fu… - … Symposium on High …, 2021 - ieeexplore.ieee.org
The demands of high-performance computing (HPC) and machine learning (ML) workloads
have resulted in the rapid architectural evolution of GPUs over the last decade. The growing …

A hardware evaluation framework for large language model inference

H Zhang, A Ning, R Prabhakar, D Wentzlaff - arxiv preprint arxiv …, 2023 - arxiv.org
The past year has witnessed the increasing popularity of Large Language Models (LLMs).
Their unprecedented scale and associated high hardware cost have impeded their broader …

Navisim: A highly accurate gpu simulator for amd rdna gpus

Y Bao, Y Sun, Z Feric, MT Shen, M Weston… - Proceedings of the …, 2022 - dl.acm.org
As GPUs continue to grow in popularity for accelerating demanding applications, such as
high-performance computing and machine learning, GPU architects need to deliver more …

Cuda flux: A lightweight instruction profiler for cuda applications

L Braun, H Fröning - 2019 IEEE/ACM Performance Modeling …, 2019 - ieeexplore.ieee.org
GPUs are powerful, massively parallel processors, which require a vast amount of thread
parallelism to keep their thousands of execution units busy, and to tolerate latency when …

Exploring modern GPU memory system design challenges through accurate modeling

M Khairy, J Akshay, T Aamodt, TG Rogers - arxiv preprint arxiv …, 2018 - arxiv.org
This paper explores the impact of simulator accuracy on architecture design decisions in the
general-purpose graphics processing unit (GPGPU) space. We perform a detailed …

GPUCloudSim: an extension of CloudSim for modeling and simulation of GPUs in cloud data centers

A Siavashi, M Momtazpour - The Journal of Supercomputing, 2019 - Springer
Recent years have witnessed an increasing growth in the usage of GPUs in cloud data
centers. It is known that conventional virtualization techniques are not directly applicable to …

Daisen: A framework for visualizing detailed gpu execution

Y Sun, Y Zhang, A Mosallaei, MD Shah… - Computer Graphics …, 2021 - Wiley Online Library
Abstract Graphics Processing Units (GPUs) have been widely used to accelerate artificial
intelligence, physics simulation, medical imaging, and information visualization applications …