Characterization of large language model development in the datacenter Q Hu, Z Ye, Z Wang, G Wang, M Zhang, Q Chen, P Sun, D Lin, X Wang, ... 21st USENIX Symposium on Networked Systems Design and Implementation (NSDI …, 2024 | 32 | 2024 |
Lucid: A non-intrusive, scalable and interpretable scheduler for deep learning training jobs Q Hu, M Zhang, P Sun, Y Wen, T Zhang Proceedings of the 28th ACM International Conference on Architectural …, 2023 | 25 | 2023 |
Hydro:{Surrogate-Based} Hyperparameter Tuning Service in Datacenters Q Hu, Z Ye, M Zhang, Q Chen, P Sun, Y Wen, T Zhang 17th USENIX Symposium on Operating Systems Design and Implementation (OSDI …, 2023 | 9 | 2023 |
FedDSE: Distribution-aware Sub-model Extraction for Federated Learning over Resource-constrained Devices H Wang, Y Jia, M Zhang, Q Hu, H Ren, P Sun, Y Wen, T Zhang Proceedings of the ACM on Web Conference 2024, 2902-2913, 2024 | 7 | 2024 |
Boosting distributed full-graph gnn training with asynchronous one-bit communication M Zhang, Q Hu, P Sun, Y Wen, T Zhang arXiv preprint arXiv:2303.01277, 2023 | 7 | 2023 |
Sylvie: 3d-adaptive and universal system for large-scale graph neural network training M Zhang, Q Hu, C Wan, H Wang, P Sun, Y Wen, T Zhang 2024 IEEE 40th International Conference on Data Engineering (ICDE), 3823-3836, 2024 | 2 | 2024 |
TorchGT: A Holistic System for Large-Scale Graph Transformer Training M Zhang, J Sun, Q Hu, P Sun, Z Wang, Y Wen, T Zhang SC24: International Conference for High Performance Computing, Networking …, 2024 | | 2024 |