Spotserve: Serving generative large language models on preemptible instances X Miao, C Shi, J Duan, X Xi, D Lin, B Cui, Z Jia Proceedings of the 29th ACM International Conference on Architectural …, 2024 | 54 | 2024 |
SKVQ: Sliding-window Key and Value Cache Quantization for Large Language Models H Duanmu, Z Yuan, X Li, J Duan, X Zhang, D Lin First Conference on Language Modeling (COLM 24), 2024 | 18 | 2024 |
Centauri: Enabling efficient scheduling for communication-computation overlap in large model training via communication partitioning C Chen, X Li, Q Zhu, J Duan, P Sun, X Zhang, C Yang Proceedings of the 29th ACM International Conference on Architectural …, 2024 | 15 | 2024 |
SampleAttention: Near-Lossless Acceleration of Long Context LLM Inference with Adaptive Structured Sparse Attention Q Zhu, J Duan, C Chen, S Liu, X Li, G Feng, X Lv, H Cao, X Chuanfu, ... arXiv preprint arXiv:2406.15486, 2024 | 12* | 2024 |
Efficient training of large language models on distributed infrastructures: a survey J Duan, S Zhang, Z Wang, L Jiang, W Qu, Q Hu, G Wang, Q Weng, H Yan, ... arXiv preprint arXiv:2407.20018, 2024 | 9 | 2024 |
Parcae: Proactive, Liveput-Optimized DNN Training on Preemptible Instances J Duan, Z Song, X Miao, X Xi, D Lin, H Xu, M Zhang, Z Jia 21st USENIX Symposium on Networked Systems Design and Implementation (NSDI …, 2024 | 7 | 2024 |
MuxServe: Flexible Spatial-Temporal Multiplexing for Multiple LLM Serving J Duan, R Lu, H Duanmu, X Li, X Zhang, D Lin, I Stoica, H Zhang Forty-first International Conference on Machine Learning, 2024 | 6* | 2024 |
Proteus: Simulating the performance of distributed DNN training J Duan, X Li, P Xu, X Zhang, S Yan, Y Liang, D Lin IEEE Transactions on Parallel and Distributed Systems, 2024 | 4 | 2024 |