NeuPIMs: NPU-PIM Heterogeneous Acceleration for Batched LLM Inferencing G Heo, S Lee, J Cho, H Choi, S Lee, H Ham, G Kim, D Mahajan, J Park Proceedings of the 29th ACM International Conference on Architectural …, 2024 | 29 | 2024 |
ONNXim: A Fast, Cycle-level Multi-core NPU Simulator H Ham, W Yang, Y Shin, O Woo, G Heo, S Lee, J Park, G Kim IEEE Computer Architecture Letters, 2024 | 3 | 2024 |
LLMServingSim: A HW/SW Co-Simulation Infrastructure for LLM Inference Serving at Scale J Cho, M Kim, H Choi, G Heo, J Park 2024 IEEE International Symposium on Workload Characterization (IISWC), 15-29, 2024 | 3 | 2024 |
Accelerating String-key Learned Index Structures via Memoization-based Incremental Training M Kim, J Hwang, G Heo, S Cho, D Mahajan, J Park arXiv preprint arXiv:2403.11472, 2024 | 2 | 2024 |
Efficient LLM Inference with Activation Checkpointing and Hybrid Caching S Lee, H Kim, S Hwang, G Heo, M Noh, J Huh arXiv preprint arXiv:2501.01792, 2025 | | 2025 |
IISWC 2024 J Cho, M Kim, H Choi, G Heo, J Park | | |