Прати
Ching-Hsiang Chu
Ching-Hsiang Chu
Research Scientist, Meta/Facebook
Верификована је имејл адреса на meta.com - Почетна страница
Наслов
Навело
Навело
Година
Software-hardware co-design for fast and scalable training of deep learning recommendation models
D Mudigere, Y Hao, J Huang, Z Jia, A Tulloch, S Sridharan, X Liu, ...
Proceedings of the 49th Annual International Symposium on Computer …, 2022
172*2022
The MVAPICH project: Transforming research into high-performance MPI library for HPC community
DK Panda, H Subramoni, CH Chu, M Bayatpour
Journal of Computational Science 52, 101208, 2021
822021
Scalable distributed dnn training using tensorflow and cuda-aware mpi: Characterization, designs, and performance evaluation
AA Awan, J Bédorf, CH Chu, H Subramoni, DK Panda
2019 19th IEEE/ACM International Symposium on Cluster, Cloud and Grid …, 2019
622019
Optimized broadcast for deep learning workloads on dense-GPU InfiniBand clusters: MPI or NCCL?
AA Awan, CH Chu, H Subramoni, DK Panda
Proceedings of the 25th European MPI Users' Group Meeting, 1-9, 2018
592018
Nv-group: link-efficient reduction for distributed deep learning on modern dense gpu systems
CH Chu, P Kousha, AA Awan, KS Khorassani, H Subramoni, DK Panda
Proceedings of the 34th ACM International Conference on Supercomputing, 1-12, 2020
512020
Oc-dnn: Exploiting advanced unified memory capabilities in cuda 9 and volta gpus for out-of-core dnn training
AA Awan, CH Chu, H Subramoni, X Lu, DK Panda
2018 IEEE 25th International Conference on High Performance Computing (HiPC …, 2018
412018
Designing high-performance mpi libraries with on-the-fly compression for modern gpu clusters
Q Zhou, C Chu, NS Kumar, P Kousha, SM Ghazimirsaeed, H Subramoni, ...
2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS …, 2021
372021
High-performance, distributed training of large-scale deep learning recommendation models
D Mudigere, Y Hao, J Huang, A Tulloch, S Sridharan, X Liu, M Ozdal, ...
arXiv preprint arXiv:2104.05158, 2021
372021
CUDA kernel based collective reduction operations on large-scale GPU clusters
CH Chu, K Hamidouche, A Venkatesh, AA Awan, DK Panda
2016 16th IEEE/ACM International Symposium on Cluster, Cloud and Grid …, 2016
322016
Exploiting gpudirect rdma in designing high performance openshmem for nvidia gpu clusters
K Hamidouche, A Venkatesh, AA Awan, H Subramoni, CH Chu, ...
2015 IEEE International Conference on Cluster Computing, 78-87, 2015
312015
Performance evaluation of MPI libraries on GPU-enabled OpenPOWER architectures: Early experiences
KS Khorassani, CH Chu, H Subramoni, DK Panda
High Performance Computing: ISC High Performance 2019 International …, 2019
292019
Improving SCTP performance by jitter-based congestion control over wired-wireless networks
JM Chen, CH Chu, EHK Wu, MF Tsai, JR Wang
EURASIP Journal on Wireless Communications and Networking 2011, 1-13, 2011
282011
Designing a ROCm-aware MPI library for AMD GPUs: early experiences
K Shafie Khorassani, J Hashmi, CH Chu, CC Chen, H Subramoni, ...
International Conference on High Performance Computing, 118-136, 2021
242021
Designing a profiling and visualization tool for scalable and in-depth analysis of high-performance GPU clusters
P Kousha, B Ramesh, KK Suresh, CH Chu, A Jain, N Sarkauskas, ...
2019 IEEE 26th International Conference on High Performance Computing, Data …, 2019
242019
Characterizing cuda unified memory (um)-aware mpi designs on modern gpu architectures
KV Manian, AA Ammar, A Ruhela, CH Chu, H Subramoni, DK Panda
Proceedings of the 12th Workshop on General Purpose Processing Using GPUs, 43-52, 2019
242019
Efficient and scalable multi-source streaming broadcast on GPU clusters for deep learning
CH Chu, X Lu, AA Awan, H Subramoni, J Hashmi, B Elton, DK Panda
2017 46th International Conference on Parallel Processing (ICPP), 161-170, 2017
242017
Communication profiling and characterization of deep-learning workloads on clusters with high-performance interconnects
AA Awan, A Jain, CH Chu, H Subramoni, DK Panda
IEEE Micro 40 (1), 35-43, 2019
232019
Optimized large-message broadcast for deep learning workloads: MPI, MPI+ NCCL, or NCCL2?
AA Awan, KV Manian, CH Chu, H Subramoni, DK Panda
parallel computing 85, 141-152, 2019
222019
Better together: Jointly optimizing {ML} collective scheduling and execution planning using {SYNDICATE}
K Mahajan, CH Chu, S Sridharan, A Akella
20th USENIX Symposium on Networked Systems Design and Implementation (NSDI …, 2023
182023
Exploiting hardware multicast and GPUDirect RDMA for efficient broadcast
CH Chu, X Lu, AA Awan, H Subramoni, B Elton, DK Panda
IEEE Transactions on Parallel and Distributed Systems 30 (3), 575-588, 2018
182018
Систем тренутно не може да изврши ову радњу. Пробајте поново касније.
Чланци 1–20