متابعة
Ruihang Lai
Ruihang Lai
بريد إلكتروني تم التحقق منه على cs.cmu.edu - الصفحة الرئيسية
عنوان
عدد مرات الاقتباسات
عدد مرات الاقتباسات
السنة
Sparsetir: Composable abstractions for sparse compilation in deep learning
Z Ye, R Lai, J Shao, T Chen, L Ceze
Proceedings of the 28th ACM International Conference on Architectural …, 2023
822023
Tensorir: An abstraction for automatic tensorized program optimization
S Feng, B Hou, H Jin, W Lin, J Shao, R Lai, Z Ye, L Zheng, CH Yu, Y Yu, ...
Proceedings of the 28th ACM International Conference on Architectural …, 2023
732023
Tensor program optimization with probabilistic programs
J Shao, X Zhou, S Feng, B Hou, R Lai, H Jin, W Lin, M Masuda, CH Yu, ...
Advances in Neural Information Processing Systems 35, 35783-35796, 2022
282022
Accelerating self-attentions for llm serving with flashinfer
Z Ye, L Chen, R Lai, Y Zhao, S Zheng, J Shao, B Hou, H Jin, Y Zuo, L Yin, ...
URL https://flashinfer. ai/2024/02/02/introduce-flashinfer. html, 2024
13*2024
Cascade inference: Memory bandwidth efficient shared prefix batch decoding
Z Ye, R Lai, BR Lu, CY Lin, S Zheng, L Chen, T Chen, L Ceze
132024
Relax: Composable Abstractions for End-to-End Dynamic Machine Learning
R Lai, J Shao, S Feng, SS Lyubomirsky, B Hou, W Lin, Z Ye, H Jin, Y Jin, ...
arXiv preprint arXiv:2311.02103, 2023
102023
Xgrammar: Flexible and efficient structured generation engine for large language models
Y Dong, CF Ruan, Y Cai, R Lai, Z Xu, Y Zhao, T Chen
arXiv preprint arXiv:2411.15100, 2024
42024
Flashinfer: Efficient and customizable attention engine for llm inference serving
Z Ye, L Chen, R Lai, W Lin, Y Zhang, S Wang, T Chen, B Kasikci, V Grover, ...
arXiv preprint arXiv:2501.01005, 2025
22025
Emerging Platforms Meet Emerging LLMs: A Year-Long Journey of Top-Down Development
S Feng, J Liu, R Lai, CF Ruan, Y Yu, L Zhang, T Chen
arXiv preprint arXiv:2404.09151, 2024
12024
WebLLM: A High-Performance In-Browser LLM Inference Engine
CF Ruan, Y Qin, X Zhou, R Lai, H Jin, Y Dong, B Hou, MS Yu, Y Zhai, ...
arXiv preprint arXiv:2412.15803, 2024
2024
A System for Microserving of LLMs
H Jin, R Lai, CF Ruan, Y Wang, TC Mowry, X Miao, Z Jia, T Chen
arXiv preprint arXiv:2412.12488, 2024
2024
يتعذر على النظام إجراء العملية في الوقت الحالي. عاود المحاولة لاحقًا.
مقالات 1–11