Stebėti
Zili Zhang
Pavadinimas
Cituota
Cituota
Metai
Fast distributed inference serving for large language models
B Wu, Y Zhong, Z Zhang, S Liu, F Liu, Y Sun, G Huang, X Liu, X Jin
arXiv preprint arXiv:2305.05920, 2023
882023
Transparent {GPU} sharing in container clouds for deep learning workloads
B Wu, Z Zhang, Z Bai, X Liu, X Jin
20th USENIX Symposium on Networked Systems Design and Implementation (NSDI …, 2023
452023
Ragcache: Efficient knowledge caching for retrieval-augmented generation
C Jin, Z Zhang, X Jiang, F Liu, X Liu, X Liu, X Jin
arXiv preprint arXiv:2404.12457, 2024
352024
Ditto: Efficient serverless analytics with elastic parallelism
C Jin, Z Zhang, X Xiang, S Zou, G Huang, X Liu, X Jin
Proceedings of the ACM SIGCOMM 2023 Conference, 406-419, 2023
172023
{dLoRA}: Dynamically orchestrating requests and adapters for {LoRA}{LLM} serving
B Wu, R Zhu, Z Zhang, P Sun, X Liu, X Jin
18th USENIX Symposium on Operating Systems Design and Implementation (OSDI …, 2024
142024
Rise of distributed deep learning training in the big model era: From a software engineering perspective
X Liu, D Gu, Z Chen, J Wen, Z Zhang, Y Ma, H Wang, X Jin
ACM Transactions on Software Engineering and Methodology 32 (6), 1-26, 2023
82023
Disttrain: Addressing model and data heterogeneity with disaggregated training for multimodal large language models
Z Zhang, Y Zhong, R Ming, H Hu, J Sun, Z Ge, Y Zhu, X Jin
arXiv preprint arXiv:2408.04275, 2024
52024
Fast Vector Query Processing for Large Datasets Beyond {GPU} Memory with Reordered Pipelining
Z Zhang, F Liu, G Huang, X Liu, X Jin
21st USENIX Symposium on Networked Systems Design and Implementation (NSDI …, 2024
52024
Rlhfuse: Efficient rlhf training for large language models with inter-and intra-stage fusion
Y Zhong, Z Zhang, B Wu, S Liu, Y Chen, C Wan, H Hu, L Xia, R Ming, ...
arXiv preprint arXiv:2409.13221, 2024
42024
Optimizing half precision Winograd convolution on ARM many-core processors
D Xie, Z Jia, Z Zhang, X Jin
Proceedings of the 13th ACM SIGOPS Asia-Pacific Workshop on Systems, 53-60, 2022
42022
Fast, approximate vector queries on very large unstructured datasets
Z Zhang, C Jin, L Tang, X Liu, X Jin
20th USENIX Symposium on Networked Systems Design and Implementation (NSDI …, 2023
32023
Jolteon: unleashing the promise of serverless for serverless workflows
Z Zhang, C Jin, X Jin
21st USENIX Symposium on Networked Systems Design and Implementation (NSDI …, 2024
12024
Step-Audio: Unified Understanding and Generation in Intelligent Speech Interaction
A Huang, B Wu, B Wang, C Yan, C Hu, C Feng, F Tian, F Shen, J Li, ...
arXiv preprint arXiv:2502.11946, 2025
2025
Sistema negali atlikti operacijos. Bandykite vėliau dar kartą.
Straipsniai 1–13