{DistServe}: Disaggregating prefill and decoding for goodput-optimized large language model serving Y Zhong, S Liu, J Chen, J Hu, Y Zhu, X Liu, X Jin, H Zhang 18th USENIX Symposium on Operating Systems Design and Implementation (OSDI …, 2024 | 123 | 2024 |
Fast distributed inference serving for large language models B Wu, Y Zhong, Z Zhang, S Liu, F Liu, Y Sun, G Huang, X Liu, X Jin arXiv preprint arXiv:2305.05920, 2023 | 88 | 2023 |
Loongserve: Efficiently serving long-context large language models with elastic sequence parallelism B Wu, S Liu, Y Zhong, P Sun, X Liu, X Jin Proceedings of the ACM SIGOPS 30th Symposium on Operating Systems Principles …, 2024 | 26 | 2024 |
RLHFuse: Efficient RLHF Training for Large Language Models with Inter-and Intra-Stage Fusion Y Zhong, Z Zhang, B Wu, S Liu, Y Chen, C Wan, H Hu, L Xia, R Ming, ... arXiv preprint arXiv:2409.13221, 2024 | 4 | 2024 |
SwiftLLM: A tiny yet powerful LLM inference system tailored for researching purpose S Liu https://github.com/interestingLSY/swiftLLM, 2024 | | 2024 |
{DistServe}: Disaggregating Prefill and Decoding for Goodput-optimized Large Language Model Serving Y Zhong, S Liu, J Chen, J Hu, Y Zhu, X Liu, X Jin, H Zhang 18th USENIX Symposium on Operating Systems Design and Implementation (OSDI …, 2024 | | 2024 |