Serverlessllm: Low-latency serverless inference for large language models Y Fu, L Xue, Y Huang, AO Brabete, D Ustiugov, Y Patel, L Mai 18th USENIX Symposium on Operating Systems Design and Implementation, 135-153, 2024 | 20 | 2024 |
ServerlessLLM: Locality-Enhanced Serverless Inference for Large Language Models Y Fu, L Xue, Y Huang, AO Brabete, D Ustiugov, Y Patel, L Mai arXiv preprint arXiv:2401.14351, 2024 | 16 | 2024 |
MoE-CAP: Cost-Accuracy-Performance Benchmarking for Mixture-of-Experts Systems Y Fu, Y Jiang, Y Huang, P Nie, Z Lu, L Xue, C He, MK Sit, J Xue, L Dong, ... arXiv preprint arXiv:2412.07067, 2024 | 1 | 2024 |
{ServerlessLLM}:{Low-Latency} Serverless Inference for Large Language Models Y Fu, L Xue, Y Huang, AO Brabete, D Ustiugov, Y Patel, L Mai 18th USENIX Symposium on Operating Systems Design and Implementation (OSDI …, 2024 | | 2024 |
Dynamics of a tunable QED in quantum spin ice K Zhu, S Morampudi, Y Huang, Y Deng, F Wilczek APS March Meeting Abstracts 2022, K51. 005, 2022 | | 2022 |