Characterizing cuda unified memory (um)-aware mpi designs on modern gpu architectures KV Manian, AA Ammar, A Ruhela, CH Chu, H Subramoni, DK Panda Proceedings of the 12th Workshop on General Purpose Processing Using GPUs, 43-52, 2019 | 24 | 2019 |
Optimized large-message broadcast for deep learning workloads: MPI, MPI+ NCCL, or NCCL2? AA Awan, KV Manian, CH Chu, H Subramoni, DK Panda parallel computing 85, 141-152, 2019 | 22 | 2019 |
Analyzing and understanding the impact of interconnect performance on HPC, Big Data, and deep learning applications: a case study with InfiniBand EDR and HDR A Ruhela, S Xu, KV Manian, H Subramoni, DK Panda 2020 IEEE International Parallel and Distributed Processing Symposium …, 2020 | 14 | 2020 |
OMB-UM: Design, implementation, and evaluation of CUDA unified memory aware MPI benchmarks KV Manian, CH Chu, AA Awan, KS Khorassani, H Subramoni, DK Panda 2019 IEEE/ACM Performance Modeling, Benchmarking and Simulation of High …, 2019 | 5 | 2019 |
Core frequency adjustment to optimize Time Warp on many-core processors P Putnam, PA Wilsey, KV Manian Simulation Modelling Practice and Theory 28, 55-64, 2012 | 5 | 2012 |
Distributed simulation on a many-core processor KV Manian, P Wilsey Proceedings of SIMUL 2011: Third Conference on Advances in System Simulation, 2011 | 5 | 2011 |
Novel Methods to Improve the Energy Efficiency of Multi-core Synchronization Primitives KV Manian University of Cincinnati, 2017 | | 2017 |