Accel-Sim: An extensible simulation framework for validated GPU modeling M Khairy, Z Shen, TM Aamodt, TG Rogers 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture …, 2020 | 326* | 2020 |
AccelWattch: A power modeling framework for modern GPUs V Kandiah, S Peverelle, M Khairy, J Pan, A Manjunath, TG Rogers, ... MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture …, 2021 | 106 | 2021 |
Efficient utilization of gpgpu cache hierarchy M Khairy, M Zahran, AG Wassal Proceedings of the 8th Workshop on General Purpose Processing using GPUS, 36-47, 2015 | 41 | 2015 |
A survey of architectural approaches for improving GPGPU performance, programmability and heterogeneity M Khairy, AG Wassal, M Zahran Journal of Parallel and Distributed Computing 127, 65-88, 2019 | 36 | 2019 |
Locality-centric data and threadblock management for massive GPUs M Khairy, V Nikiforov, D Nellans, TG Rogers 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture …, 2020 | 35 | 2020 |
Principal kernel analysis: A tractable methodology to simulate scaled GPU workloads C Avalos Baddouh, M Khairy, RN Green, M Payer, TG Rogers MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture …, 2021 | 29 | 2021 |
A quantitative evaluation of contemporary gpu simulation methodology A Jain, M Khairy, TG Rogers Proceedings of the ACM on Measurement and Analysis of Computing Systems 2 (2 …, 2018 | 20 | 2018 |
SACAT: Streaming-aware conflict-avoiding thrashing-resistant GPGPU cache management scheme M Khairy, M Zahran, A Wassal IEEE Transactions on Parallel and Distributed Systems 28 (6), 1740-1753, 2016 | 13 | 2016 |
SIMR: Single Instruction Multiple Request Processing for Energy-Efficient Data Center Microservices M Khairy, A Alawneh, A Barnes, TG Rogers 2022 55th IEEE/ACM International Symposium on Microarchitecture (MICRO), 441-463, 2022 | 7 | 2022 |
TPU vs GPU vs Cerebras vs Graphcore: A fair comparison between ML hardware M Khairy https://khairy2011.medium.com/tpu-vs-gpu-vs-cerebras-vs-graphcore-a-fair …, 2020 | 7 | 2020 |
SST_GPU: An Execution-Driven CUDA Kernel Scheduler and Streaming-Multiprocessor Compute Model. M Khairy, M Zhang, R Green, SD Hammond, RJ Hoekstra, T Rogers, ... Sandia National Lab.(SNL-NM), Albuquerque, NM (United States), 2019 | 5 | 2019 |
System and methods for single instruction multiple request processing TG Rogers, M Khairy US Patent App. 18/072,492, 2023 | 2 | 2023 |
A SIMT Analyzer for Multi-Threaded CPU Applications A Alawneh, M Khairy, TG Rogers 2022 IEEE International Symposium on Performance Analysis of Systems and …, 2022 | 1 | 2022 |
ThreadFuser: A SIMT Analysis Framework for MIMD Programs A Alawneh, N Kang, M Khairy, TG Rogers 2024 57th IEEE/ACM International Symposium on Microarchitecture (MICRO …, 2024 | | 2024 |
An Academic’s Attempt to Clear the Fog of the Machine Learning Accelerator War M Khairy, T Rogers https://www.sigarch.org/an-academics-attempt-to-clear-the-fog-of-the-machine …, 2021 | | 2021 |
Balar: A SST GPU Component for Performance Modeling and Profiling C Hughes, SD Hammond, RJ Hoekstra, M Zhang, M Khairy, T Rogers, ... Sandia National Lab.(SNL-NM), Albuquerque, NM (United States), 2019 | | 2019 |
ISPASS 2023 A Ferreron, A Samajdar, A Gutierrez, A Shriraman, A Rodrigues, B Asgari, ... | | |
SANDIA REPORT M Zhang, M Khairy, T Rogers | | |