Folgen
Yilong Zhao
Yilong Zhao
Ph.D. student, UC Berkeley
Bestätigte E-Mail-Adresse bei berkeley.edu - Startseite
Titel
Zitiert von
Zitiert von
Jahr
Deepseek-v2: A strong, economical, and efficient mixture-of-experts language model
A Liu, B Feng, B Wang, B Wang, B Liu, C Zhao, C Dengr, C Ruan, D Dai, ...
arXiv preprint arXiv:2405.04434, 2024
1052024
Atom: Low-bit Quantization for Efficient and Accurate LLM Serving
Y Zhao, CY Lin, K Zhu, Z Ye, L Chen, S Zheng, L Ceze, A Krishnamurthy, ...
Proceedings of Machine Learning and Systems 6, 196--209, 2023
912023
QUEST: Query-Aware Sparsity for Efficient Long-Context LLM Inference
J Tang, Y Zhao, K Zhu, G Xiao, B Kasikci, S Han
Forty-first International Conference on Machine Learning, 2024
36*2024
Accelerating self-attentions for llm serving with flashinfer
Z Ye, L Chen, R Lai, Y Zhao, S Zheng, J Shao, B Hou, H Jin, Y Zuo, L Yin, ...
URL https://flashinfer. ai/2024/02/02/introduce-flashinfer. html, 2024
112024
Nanoflow: Towards optimal large language model serving throughput
K Zhu, Y Zhao, L Zhao, G Zuo, Y Gu, D Xie, Y Gao, Q Xu, T Tang, Z Ye, ...
arXiv preprint arXiv:2408.12757, 2024
102024
Xgrammar: Flexible and efficient structured generation engine for large language models
Y Dong, CF Ruan, Y Cai, R Lai, Z Xu, Y Zhao, T Chen
arXiv preprint arXiv:2411.15100, 2024
22024
BlendServe: Optimizing Offline Inference for Auto-regressive Large Models with Resource-aware Batching
Y Zhao, S Yang, K Zhu, L Zheng, B Kasikci, Y Zhou, J Xing, I Stoica
arXiv preprint arXiv:2411.16102, 2024
12024
Microless: Cost-Efficient Hybrid Deployment of Microservices on IaaS VMs and Serverless
J Cheng, Y Zhao, Z Li, Q Chen, W Cui, M Guo
2023 IEEE 29th International Conference on Parallel and Distributed Systems …, 2023
12023
Sparse VideoGen: Accelerating Video Diffusion Transformers with Spatial-Temporal Sparsity
H Xi, S Yang, Y Zhao, C Xu, M Li, X Li, Y Lin, H Cai, J Zhang, D Li, J Chen, ...
arXiv preprint arXiv:2502.01776, 2025
2025
Serverless Computing based on Dynamic-Addressable Session
Z Li, Y Zhao, Q Chen, M Guo
SCIENTIA SINICA Informationis 54 (3), 582-602, 2024
2024
Das System kann den Vorgang jetzt nicht ausführen. Versuchen Sie es später erneut.
Artikel 1–10