Sparsetir: Composable abstractions for sparse compilation in deep learning Z Ye, R Lai, J Shao, T Chen, L Ceze Proceedings of the 28th ACM International Conference on Architectural …, 2023 | 82 | 2023 |
Tensorir: An abstraction for automatic tensorized program optimization S Feng, B Hou, H Jin, W Lin, J Shao, R Lai, Z Ye, L Zheng, CH Yu, Y Yu, ... Proceedings of the 28th ACM International Conference on Architectural …, 2023 | 73 | 2023 |
Tensor program optimization with probabilistic programs J Shao, X Zhou, S Feng, B Hou, R Lai, H Jin, W Lin, M Masuda, CH Yu, ... Advances in Neural Information Processing Systems 35, 35783-35796, 2022 | 28 | 2022 |
Accelerating self-attentions for llm serving with flashinfer Z Ye, L Chen, R Lai, Y Zhao, S Zheng, J Shao, B Hou, H Jin, Y Zuo, L Yin, ... URL https://flashinfer. ai/2024/02/02/introduce-flashinfer. html, 2024 | 13* | 2024 |
Cascade inference: Memory bandwidth efficient shared prefix batch decoding Z Ye, R Lai, BR Lu, CY Lin, S Zheng, L Chen, T Chen, L Ceze | 13 | 2024 |
Relax: Composable Abstractions for End-to-End Dynamic Machine Learning R Lai, J Shao, S Feng, SS Lyubomirsky, B Hou, W Lin, Z Ye, H Jin, Y Jin, ... arXiv preprint arXiv:2311.02103, 2023 | 10 | 2023 |
Xgrammar: Flexible and efficient structured generation engine for large language models Y Dong, CF Ruan, Y Cai, R Lai, Z Xu, Y Zhao, T Chen arXiv preprint arXiv:2411.15100, 2024 | 4 | 2024 |
Flashinfer: Efficient and customizable attention engine for llm inference serving Z Ye, L Chen, R Lai, W Lin, Y Zhang, S Wang, T Chen, B Kasikci, V Grover, ... arXiv preprint arXiv:2501.01005, 2025 | 2 | 2025 |
Emerging Platforms Meet Emerging LLMs: A Year-Long Journey of Top-Down Development S Feng, J Liu, R Lai, CF Ruan, Y Yu, L Zhang, T Chen arXiv preprint arXiv:2404.09151, 2024 | 1 | 2024 |
WebLLM: A High-Performance In-Browser LLM Inference Engine CF Ruan, Y Qin, X Zhou, R Lai, H Jin, Y Dong, B Hou, MS Yu, Y Zhai, ... arXiv preprint arXiv:2412.15803, 2024 | | 2024 |
A System for Microserving of LLMs H Jin, R Lai, CF Ruan, Y Wang, TC Mowry, X Miao, Z Jia, T Chen arXiv preprint arXiv:2412.12488, 2024 | | 2024 |