{Cost-Efficient} large language model serving for multi-turn conversations with {CachedAttention} B Gao, Z He, P Sharma, Q Kang, D Jevdjic, J Deng, X Yang, Z Yu, P Zuo 2024 USENIX Annual Technical Conference (USENIX ATC 24), 111-126, 2024 | 17 | 2024 |
An FSO tracking system for Gaussian beams F Wang, T Cheng, A Xu, Z He, P Jiang, B Zhu 2020 IEEE 6th International Conference on Computer and Communications (ICCC …, 2020 | 4 | 2020 |
AdaSkip: Adaptive Sublayer Skipping for Accelerating Long-Context LLM Inference Z He, Y Yao, P Zuo, B Gao, Q Li, Z Zheng, F Wu arXiv preprint arXiv:2501.02336, 2025 | | 2025 |
IMI: In-memory Multi-job Inference Acceleration for Large Language Models B Gao, Z Wang, Z He, T Luo, WF Wong, Z Zhou Proceedings of the 53rd International Conference on Parallel Processing, 752-761, 2024 | | 2024 |