Follow
Qinghao Hu
Title
Cited by
Cited by
Year
Characterization and Prediction of Deep Learning Workloads in Large-Scale GPU Datacenters
Q Hu, P Sun, S Yan, Y Wen, T Zhang
Proceedings of the International Conference for High Performance Computing …, 2021
1372021
Deep learning workload scheduling in gpu datacenters: Taxonomy, challenges and vision
W Gao, Q Hu, Z Ye, P Sun, X Wang, Y Luo, T Zhang, Y Wen
arXiv preprint arXiv:2205.11913, 2022
352022
Characterization of large language model development in the datacenter
Q Hu, Z Ye, Z Wang, G Wang, M Zhang, Q Chen, P Sun, D Lin, X Wang, ...
21st USENIX Symposium on Networked Systems Design and Implementation (NSDI …, 2024
322024
Longvila: Scaling long-context visual language models for long videos
F Xue, Y Chen, D Li, Q Hu, L Zhu, X Li, Y Fang, H Tang, S Yang, Z Liu, ...
arXiv preprint arXiv:2408.10188, 2024
292024
Lucid: A non-intrusive, scalable and interpretable scheduler for deep learning training jobs
Q Hu, M Zhang, P Sun, Y Wen, T Zhang
Proceedings of the 28th ACM International Conference on Architectural …, 2023
252023
Deep learning workload scheduling in gpu datacenters: A survey
Z Ye, W Gao, Q Hu, P Sun, X Wang, Y Luo, T Zhang, Y Wen
ACM Computing Surveys 56 (6), 1-38, 2024
212024
Deltazip: Multi-tenant language model serving via delta compression
X Yao, A Klimovic
arXiv preprint arXiv:2312.05215, 2023
112023
Loongtrain: Efficient training of long-sequence llms with head-context parallelism
D Gu, P Sun, Q Hu, T Huang, X Chen, Y Xiong, G Wang, Q Chen, S Zhao, ...
arXiv preprint arXiv:2406.18485, 2024
102024
Hydro:{Surrogate-Based} Hyperparameter Tuning Service in Datacenters
Q Hu, Z Ye, M Zhang, Q Chen, P Sun, Y Wen, T Zhang
17th USENIX Symposium on Operating Systems Design and Implementation (OSDI …, 2023
92023
FedDSE: Distribution-aware Sub-model Extraction for Federated Learning over Resource-constrained Devices
H Wang, Y Jia, M Zhang, Q Hu, H Ren, P Sun, Y Wen, T Zhang
Proceedings of the ACM on Web Conference 2024, 2902-2913, 2024
72024
Internevo: Efficient long-sequence large language model training via hybrid parallelism and redundant sharding
Q Chen, D Gu, G Wang, X Chen, YT Xiong, T Huang, Q Hu, X Jin, Y Wen, ...
arXiv preprint arXiv:2401.09149, 2024
72024
Boosting distributed full-graph gnn training with asynchronous one-bit communication
M Zhang, Q Hu, P Sun, Y Wen, T Zhang
arXiv preprint arXiv:2303.01277, 2023
72023
Efficient training of large language models on distributed infrastructures: a survey
J Duan, S Zhang, Z Wang, L Jiang, W Qu, Q Hu, G Wang, Q Weng, H Yan, ...
arXiv preprint arXiv:2407.20018, 2024
52024
Primo: Practical Learning-Augmented Systems with Interpretable Models
Q Hu, H Nori, P Sun, Y Wen, T Zhang
2022 USENIX Annual Technical Conference (USENIX ATC 22), 519-538, 2022
42022
Sylvie: 3d-adaptive and universal system for large-scale graph neural network training
M Zhang, Q Hu, C Wan, H Wang, P Sun, Y Wen, T Zhang
2024 IEEE 40th International Conference on Data Engineering (ICDE), 3823-3836, 2024
22024
AMSP: Super-Scaling LLM Training via Advanced Model States Partitioning
Q Chen, Q Hu, Z Ye, G Wang, P Sun, Y Wen, T Zhang
arXiv preprint arXiv:2311.00257, 2023
12023
TorchGT: A Holistic System for Large-Scale Graph Transformer Training
M Zhang, J Sun, Q Hu, P Sun, Z Wang, Y Wen, T Zhang
SC24: International Conference for High Performance Computing, Networking …, 2024
2024
Lins: Reducing Communication Overhead of ZeRO for Efficient LLM Training
Q Chen, Q Hu, G Wang, Y Xiong, T Huang, X Chen, Y Gao, H Yan, Y Wen, ...
2024 IEEE/ACM 32nd International Symposium on Quality of Service (IWQoS), 1-10, 2024
2024
Building efficient and practical machine learning systems
Q Hu
Nanyang Technological University, 2023
2023
Understanding the Workload Characteristics of Large Language Model Development
Q Hu, P Sun, T Zhang
The system can't perform the operation now. Try again later.
Articles 1–20