Merak: An efficient distributed dnn training framework with automated 3d parallelism for giant foundation models Z Lai, S Li, X Tang, K Ge, W Liu, Y Duan, L Qiao, D Li IEEE Transactions on Parallel and Distributed Systems 34 (5), 1466-1478, 2023 | 42 | 2023 |
AutoPipe: A fast pipeline parallelism approach with balanced partitioning and micro-batch slicing W Liu, Z Lai, S Li, Y Duan, K Ge, D Li 2022 IEEE International Conference on Cluster Computing (CLUSTER), 301-312, 2022 | 21 | 2022 |
HPDL: towards a general framework for high-performance distributed deep learning D Li, Z Lai, K Ge, Y Zhang, Z Zhang, Q Wang, H Wang 2019 IEEE 39th International Conference on Distributed Computing Systems …, 2019 | 21 | 2019 |
An efficient ADMM-based algorithm to nonconvex penalized support vector machines L Guan, L Qiao, D Li, T Sun, K Ge, X Lu 2018 IEEE International Conference on Data Mining Workshops (ICDMW), 1209-1216, 2018 | 21 | 2018 |
An efficient parallel and distributed solution to nonconvex penalized linear SVMs L Guan, T Sun, L Qiao, Z Yang, D Li, K Ge, X Lu Frontiers of Information Technology & Electronic Engineering 21, 587-603, 2020 | 17 | 2020 |
Hph: Hybrid parallelism on heterogeneous clusters for accelerating large-scale dnns training Y Duan, Z Lai, S Li, W Liu, K Ge, P Liang, D Li 2022 IEEE International Conference on Cluster Computing (CLUSTER), 313-323, 2022 | 13 | 2022 |
Efficient parallel implementation of a density peaks clustering algorithm on graphics processing unit K Ge, H Su, D Li, X Lu Frontiers of Information Technology & Electronic Engineering 18 (7), 915-927, 2017 | 9 | 2017 |
Deep discriminative clustering network X Shaol, K Ge, H Su, L Luo, B Peng, D Li 2018 International Joint Conference on Neural Networks (IJCNN), 1-7, 2018 | 8 | 2018 |
Prophet: Fine-grained load balancing for parallel training of large-scale moe models W Wang, Z Lai, S Li, W Liu, K Ge, Y Liu, A Shen, D Li 2023 IEEE International Conference on Cluster Computing (CLUSTER), 82-94, 2023 | 7 | 2023 |
Accelerate distributed deep learning with cluster-aware sketch quantization K Ge, Y Zhang, Y Fu, Z Lai, X Deng, D Li Science China Information Sciences 66 (6), 162102, 2023 | 5 | 2023 |
Automated tensor model parallelism with overlapped communication for efficient foundation model training S Li, Z Lai, Y Hao, W Liu, K Ge, X Deng, D Li, K Lu arXiv preprint arXiv:2305.16121, 2023 | 5 | 2023 |
Advances of pipeline model parallelism for deep learning training: an overview L Guan, DS Li, JY Liang, WJ Wang, KS Ge, XC Lu Journal of Computer Science and Technology 39 (3), 567-584, 2024 | 4 | 2024 |
S2 reducer: High-performance sparse communication to accelerate distributed deep learning K Ge, Y Fu, Y Zhang, Z Lai, X Deng, D Li ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and …, 2022 | 4 | 2022 |
A multidimensional communication scheduling method for hybrid parallel dnn training S Li, K Lu, Z Lai, W Liu, K Ge, D Li IEEE Transactions on Parallel and Distributed Systems, 2024 | 3 | 2024 |
Compressed collective sparse-sketch for distributed data-parallel training of deep learning models K Ge, K Lu, Y Fu, X Deng, Z Lai, D Li IEEE Journal on Selected Areas in Communications 41 (4), 941-963, 2023 | 3 | 2023 |
Auto-divide GNN: Accelerating GNN training with subgraph division H Chen, Z Ran, K Ge, Z Lai, J Jiang, D Li European Conference on Parallel Processing, 367-382, 2023 | 1 | 2023 |
BRGraph: An efficient graph neural network training system by reusing batch data on GPU K Ge, Z Ran, Z Lai, L Zhang, D Li Concurrency and Computation: Practice and Experience 34 (15), e6961, 2022 | 1 | 2022 |
Casq: Accelerate distributed deep learning with sketch-based gradient quantization K Ge, Y Zhang, Y Fu, Z Lai, X Deng, D Li 2021 IEEE International Conference on Cluster Computing (CLUSTER), 825-826, 2021 | 1 | 2021 |
Efficient deep neural network training via decreasing precision with layer capacity A Shen, Z Lai, T Sun, S Li, K Ge, W Liu, D Li Frontiers of Computer Science 19 (10), 1910355, 2025 | | 2025 |
AutoPipe-H: A Heterogeneity-Aware Data-Paralleled Pipeline Approach on Commodity GPU Servers W Liu, K Lu, Z Lai, S Li, K Ge, D Li, X Lu IEEE Transactions on Computers, 2024 | | 2024 |