A survey of deep learning techniques for neural machine translation S Yang, Y Wang, X Chu arXiv preprint arXiv:2002.07526, 2020 | 216 | 2020 |
A distributed synchronous SGD algorithm with global top-k sparsification for low bandwidth networks S Shi, Q Wang, K Zhao, Z Tang, Y Wang, X Huang, X Chu 2019 IEEE 39th International Conference on Distributed Computing Systems …, 2019 | 176 | 2019 |
The impact of GPU DVFS on the energy and performance of deep learning: An empirical study Z Tang, Y Wang, Q Wang, X Chu Proceedings of the Tenth ACM International Conference on Future Energy …, 2019 | 91 | 2019 |
Benchmarking the performance and energy efficiency of AI accelerators for AI training Y Wang, Q Wang, S Shi, X He, Z Tang, K Zhao, X Chu 2020 20th IEEE/ACM International Symposium on Cluster, Cloud and Internet …, 2020 | 84 | 2020 |
Fusionai: Decentralized training and deploying llms with massive consumer-level gpus Z Tang, Y Wang, X He, L Zhang, X Pan, Q Wang, R Zeng, K Zhao, S Shi, ... arXiv preprint arXiv:2309.01172, 2023 | 28 | 2023 |
BurstGPT: A Real-world Workload Dataset to Optimize LLM Serving Systems Y Wang, Y Chen, Z Li, X Kang, Z Tang, X He, R Guo, X Wang, Q Wang, ... arXiv preprint arXiv:2401.17644v, 2024 | 21* | 2024 |
Computer-aided clinical skin disease diagnosis using cnn and object detection models X He, S Wang, S Shi, Z Tang, Y Wang, Z Zhao, J Dai, R Ni, X Zhang, X Liu, ... 2019 IEEE international conference on big data (Big Data), 4839-4844, 2019 | 15 | 2019 |
Nas-lid: Efficient neural architecture search with local intrinsic dimension X He, J Yao, Y Wang, Z Tang, KC Cheung, S See, B Han, X Chu Proceedings of the AAAI Conference on Artificial Intelligence 37 (6), 7839-7847, 2023 | 11 | 2023 |
FedML Parrot: A scalable federated learning system via heterogeneity-aware scheduling on sequential and hierarchical training Z Tang, X Chu, RY Ran, S Lee, S Shi, Y Zhang, Y Wang, AQ Liang, ... arXiv preprint arXiv:2303.01778, 2023 | 10 | 2023 |
Fault-Tolerant Hybrid-Parallel Training at Scale with Reliable and Efficient In-memory Checkpointing Y Wang, X Kang, S Shi, X He, Z Tang, X Pan, Y Zheng, X Wu, AC Zhou, ... arXiv preprint arXiv:2310.12670, 2024 | 9* | 2024 |
Energy-efficient Inference Service of Transformer-based Deep Learning Models on GPUs Y Wang, Q Wang, X Chu 2020 International Conferences on IEEE Green Computing and Communications …, 2020 | 6 | 2020 |
Energy-efficient Online Scheduling of Transformer Inference Services on GPU Servers Y Wang, Q Wang, X Chu IEEE Transactions on Green Communications and Networking, 2022 | 3 | 2022 |
Expertflow: Optimized expert activation and token allocation for efficient mixture-of-experts inference X He, S Zhang, Y Wang, H Yin, Z Zeng, S Shi, Z Tang, X Chu, I Tsang, ... arXiv preprint arXiv:2410.17954, 2024 | 2 | 2024 |
Fusionllm: A decentralized llm training system on geo-distributed gpus with adaptive compression Z Tang, X Kang, Y Yin, X Pan, Y Wang, X He, Q Wang, R Zeng, K Zhao, ... arXiv preprint arXiv:2410.12707, 2024 | 2 | 2024 |
Bandwidth-Aware and Overlap-Weighted Compression for Communication-Efficient Federated Learning Z Tang, J Huang, R Yan, Y Wang, Z Tang, S Shi, AC Zhou, X Chu Proceedings of the 53rd International Conference on Parallel Processing, 866-875, 2024 | 1 | 2024 |
DreamDDP: Accelerating Data Parallel Distributed LLM Training with Layer-wise Scheduled Partial Synchronization BL Zhenheng Tang, Zichen Tang, Junlin Huang, Xinglin Pan, Rudan Yan, Yuxin ... https://arxiv.org/abs/2502.11058, 2025 | | 2025 |