Follow
Zhuomin He
Zhuomin He
Unknown affiliation
Verified email at sjtu.edu.cn
Title
Cited by
Cited by
Year
{Cost-Efficient} large language model serving for multi-turn conversations with {CachedAttention}
B Gao, Z He, P Sharma, Q Kang, D Jevdjic, J Deng, X Yang, Z Yu, P Zuo
2024 USENIX Annual Technical Conference (USENIX ATC 24), 111-126, 2024
172024
An FSO tracking system for Gaussian beams
F Wang, T Cheng, A Xu, Z He, P Jiang, B Zhu
2020 IEEE 6th International Conference on Computer and Communications (ICCC …, 2020
42020
AdaSkip: Adaptive Sublayer Skipping for Accelerating Long-Context LLM Inference
Z He, Y Yao, P Zuo, B Gao, Q Li, Z Zheng, F Wu
arXiv preprint arXiv:2501.02336, 2025
2025
IMI: In-memory Multi-job Inference Acceleration for Large Language Models
B Gao, Z Wang, Z He, T Luo, WF Wong, Z Zhou
Proceedings of the 53rd International Conference on Parallel Processing, 752-761, 2024
2024
The system can't perform the operation now. Try again later.
Articles 1–4