Follow
Hongyi Jin
Hongyi Jin
Verified email at andrew.cmu.edu
Title
Cited by
Cited by
Year
Towards efficient generative large language model serving: A survey from algorithms to systems
X Miao, G Oliaro, Z Zhang, X Cheng, H Jin, T Chen, Z Jia
arXiv preprint arXiv:2312.15234, 2023
702023
Tensorir: An abstraction for automatic tensorized program optimization
S Feng, B Hou, H Jin, W Lin, J Shao, R Lai, Z Ye, L Zheng, CH Yu, Y Yu, ...
Proceedings of the 28th ACM International Conference on Architectural …, 2023
702023
Tensor program optimization with probabilistic programs
J Shao, X Zhou, S Feng, B Hou, R Lai, H Jin, W Lin, M Masuda, CH Yu, ...
Advances in Neural Information Processing Systems 35, 35783-35796, 2022
282022
Accelerating self-attentions for llm serving with flashinfer
Z Ye, L Chen, R Lai, Y Zhao, S Zheng, J Shao, B Hou, H Jin, Y Zuo, L Yin, ...
URL https://flashinfer. ai/2024/02/02/introduce-flashinfer. html, 2024
112024
Relax: Composable Abstractions for End-to-End Dynamic Machine Learning
R Lai, J Shao, S Feng, SS Lyubomirsky, B Hou, W Lin, Z Ye, H Jin, Y Jin, ...
arXiv preprint arXiv:2311.02103, 2023
92023
WebLLM: A High-Performance In-Browser LLM Inference Engine
CF Ruan, Y Qin, X Zhou, R Lai, H Jin, Y Dong, B Hou, MS Yu, Y Zhai, ...
arXiv preprint arXiv:2412.15803, 2024
2024
A System for Microserving of LLMs
H Jin, R Lai, CF Ruan, Y Wang, TC Mowry, X Miao, Z Jia, T Chen
arXiv preprint arXiv:2412.12488, 2024
2024
The system can't perform the operation now. Try again later.
Articles 1–7