Follow
Suchita Pati
Suchita Pati
Verified email at cs.wisc.edu - Homepage
Title
Cited by
Cited by
Year
Analyzing machine learning workloads using a detailed GPU simulator
J Lew, DA Shah, S Pati, S Cattell, M Zhang, A Sandhupatla, C Ng, N Goli, ...
2019 IEEE international symposium on performance analysis of systems and …, 2019
922019
Demystifying bert: System design implications
S Pati, S Aga, N Jayasena, MD Sinclair
2022 IEEE International Symposium on Workload Characterization (IISWC), 296-309, 2022
292022
SeqPoint: Identifying representative iterations of sequence-based neural networks
S Pati, S Aga, MD Sinclair, N Jayasena
2020 IEEE International Symposium on Performance Analysis of Systems and …, 2020
172020
Demystifying bert: Implications for accelerator design
S Pati, S Aga, N Jayasena, MD Sinclair
arXiv preprint arXiv:2104.08335, 2021
152021
T3: Transparent Tracking & Triggering for Fine-grained Overlap of Compute & Collectives
S Pati, S Aga, M Islam, N Jayasena, MD Sinclair
Proceedings of the 29th ACM International Conference on Architectural …, 2024
122024
Tale of Two Cs: Computation vs. Communication Scaling for Future Transformers on Future Hardware
S Pati, S Aga, M Islam, N Jayasena, MD Sinclair
2023 IEEE International Symposium on Workload Characterization (IISWC), 140-153, 2023
122023
JIT-Q: Just-in-time Quantization with Processing-In-Memory for Efficient ML Training
M Ibrahim, S Aga, A Li, S Pati, M Islam
Proceedings of Machine Learning and Systems 6, 46-59, 2024
6*2024
Computation vs. communication scaling for future transformers on future hardware
S Pati, S Aga, M Islam, N Jayasena, MD Sinclair
arXiv preprint arXiv:2302.02825, 2023
62023
Improving GPU Utilization in ML Workloads Through Finer-Grained Synchronization
R Kuper, S Pati, MD Sinclair
3rd Young Architects Workshop, 2021
52021
Darts: Performance-counter driven sampling using binary translators
R Kumar, S Pati, K Lahiri
2017 IEEE International Symposium on Performance Analysis of Systems and …, 2017
42017
Analyzing Machine Learning Workloads Using a Detailed GPU Simulator. CoRR abs/1811.08933 (2018)
J Lew, D Shah, S Pati, S Cattell, M Zhang, A Sandhupatla, C Ng, N Goli, ...
arXiv preprint arXiv:1811.08933, 2018
32018
Global Optimizations & Lightweight Dynamic Logic for Concurrency
S Pati, S Aga, N Jayasena, M Sinclair
https://arxiv.org/pdf/2409.02227, 2024
12024
Exploring GPU Architectural Optimizations for RNNs
S Pati
Young Architect Workshop (YArch), in conjunction with HPCA'19, 2019
12019
Optimizing ML Concurrent Computation and Communication with GPU DMA Engines
A Agrawal, S Aga, S Pati, M Islam
arXiv preprint arXiv:2412.14335, 2024
2024
Dynamic control of work scheduling
S Pati, AGA Shaizeen, N Jayasena, MD Sinclair
US Patent App. 18/091,443, 2024
2024
Fused Data Generation and Associated Communication
SD Aga, S Pati, NS Jayasena
US Patent App. 18/190,620, 2024
2024
Cross-Stack Optimizations for Sequence-Based Models on GPUs
S Pati
https://www.proquest.com/docview/3054333891, 2024
2024
IISWC 2024
A Jog, A Hankin, A Samajdar, A Putnam, A Shriraman, A Mishra, B Asgari, ...
Effective Prefetching for Multicore/Multiprocessor Systems
S Pati, P Mahapatra
Transparent Compression for Flash SSDs
S Pati, Y Trivedi
The system can't perform the operation now. Try again later.
Articles 1–20