متابعة
Gabriele Oliaro
Gabriele Oliaro
بريد إلكتروني تم التحقق منه على cs.cmu.edu - الصفحة الرئيسية
عنوان
عدد مرات الاقتباسات
عدد مرات الاقتباسات
السنة
Specinfer: Accelerating large language model serving with tree-based speculative inference and verification
X Miao, G Oliaro, Z Zhang, X Cheng, Z Wang, Z Zhang, RYY Wong, A Zhu, ...
Proceedings of the 29th ACM International Conference on Architectural …, 2024
2082024
Towards efficient generative large language model serving: A survey from algorithms to systems
X Miao, G Oliaro, Z Zhang, X Cheng, H Jin, T Chen, Z Jia
ACM Computing Surveys (CSUR) 57 (7), 2023
732023
Direct Telemetry Access
J Langlet, R Ben Basat, G Oliaro, M Mitzenmacher, M Yu, G Antichi
SIGCOMM 2023, 2023
162023
Zero-CPU collection with direct telemetry access
J Langlet, R Ben-Basat, S Ramanathan, G Oliaro, M Mitzenmacher, M Yu, ...
HotNets 2021, 2021
142021
Quantized Side Tuning: Fast and Memory-Efficient Tuning of Quantized Large Language Models
Z Zhang, D Zhao, X Miao, G Oliaro, Q Li, Y Jiang, Z Jia
🏆 ACL 2024 (Outstanding paper award), 2024
72024
FlexLLM: A System for Co-Serving Large Language Model Inference and Parameter-Efficient Finetuning
X Miao, G Oliaro, X Cheng, M Wu, C Unger, Z Jia
arXiv preprint arXiv:2402.18789, 2024
52024
AdaServe: SLO-Customized LLM Serving with Fine-Grained Speculative Decoding
Z Li, Z Chen, R Delacourt, G Oliaro, Z Wang, Q Chen, S Lin, A Yang, ...
arXiv preprint arXiv:2501.12162, 2025
2025
SuffixDecoding: A Model-Free Approach to Speeding Up Large Language Model Inference
G Oliaro, Z Jia, D Campos, A Qiao
arXiv preprint arXiv:2411.04975, 2024
2024
Optimal Kernel Orchestration for Tensor Programs with Korch
M Hu, A Venkatram, S Biswas, B Marimuthu, B Hou, G Oliaro, H Wang, ...
ASPLOS 2024, 2024
2024
يتعذر على النظام إجراء العملية في الوقت الحالي. عاود المحاولة لاحقًا.
مقالات 1–9