Specinfer: Accelerating large language model serving with tree-based speculative inference and verification X Miao, G Oliaro, Z Zhang, X Cheng, Z Wang, Z Zhang, RYY Wong, A Zhu, ... Proceedings of the 29th ACM International Conference on Architectural …, 2024 | 208 | 2024 |
Towards efficient generative large language model serving: A survey from algorithms to systems X Miao, G Oliaro, Z Zhang, X Cheng, H Jin, T Chen, Z Jia ACM Computing Surveys (CSUR) 57 (7), 2023 | 73 | 2023 |
Direct Telemetry Access J Langlet, R Ben Basat, G Oliaro, M Mitzenmacher, M Yu, G Antichi SIGCOMM 2023, 2023 | 16 | 2023 |
Zero-CPU collection with direct telemetry access J Langlet, R Ben-Basat, S Ramanathan, G Oliaro, M Mitzenmacher, M Yu, ... HotNets 2021, 2021 | 14 | 2021 |
Quantized Side Tuning: Fast and Memory-Efficient Tuning of Quantized Large Language Models Z Zhang, D Zhao, X Miao, G Oliaro, Q Li, Y Jiang, Z Jia 🏆 ACL 2024 (Outstanding paper award), 2024 | 7 | 2024 |
FlexLLM: A System for Co-Serving Large Language Model Inference and Parameter-Efficient Finetuning X Miao, G Oliaro, X Cheng, M Wu, C Unger, Z Jia arXiv preprint arXiv:2402.18789, 2024 | 5 | 2024 |
AdaServe: SLO-Customized LLM Serving with Fine-Grained Speculative Decoding Z Li, Z Chen, R Delacourt, G Oliaro, Z Wang, Q Chen, S Lin, A Yang, ... arXiv preprint arXiv:2501.12162, 2025 | | 2025 |
SuffixDecoding: A Model-Free Approach to Speeding Up Large Language Model Inference G Oliaro, Z Jia, D Campos, A Qiao arXiv preprint arXiv:2411.04975, 2024 | | 2024 |
Optimal Kernel Orchestration for Tensor Programs with Korch M Hu, A Venkatram, S Biswas, B Marimuthu, B Hou, G Oliaro, H Wang, ... ASPLOS 2024, 2024 | | 2024 |