Segueix
Zhuoming Chen
Zhuoming Chen
Correu electrònic verificat a andrew.cmu.edu
Títol
Citada per
Citada per
Any
Specinfer: Accelerating generative llm serving with speculative inference and token tree verification
X Miao, G Oliaro, Z Zhang, X Cheng, Z Wang, RYY Wong, Z Chen, ...
arXiv preprint arXiv:2305.09781 1 (2), 4, 2023
216*2023
Quantized training of gradient boosting decision trees
Y Shi, G Ke, Z Chen, S Zheng, TY Liu
Advances in neural information processing systems 35, 18822-18833, 2022
292022
TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding
H Sun, Z Chen, X Yang, Y Tian, B Chen
COLM 2024, 2024
262024
Sequoia: Scalable and Robust Speculative Decoding
Z Chen, A May, R Svirschevski, YH Huang, M Ryabinin, Z Jia, B Chen
NeurIPS 2024 (Spotlight), 2024
24*2024
MagicDec: Breaking the Latency-Throughput Tradeoff for Long Context Generation with Speculative Decoding
J Chen, V Tiwari, R Sadhukhan, Z Chen, J Shi, IEH Yen, B Chen
arXiv preprint arXiv:2408.11049, 2024
102024
SpecExec: Massively Parallel Speculative Decoding for Interactive LLM Inference on Consumer Devices
R Svirschevski, A May, Z Chen, B Chen, Z Jia, M Ryabinin
NeurIPS 2024, 2024
92024
MagicPIG: LSH Sampling for Efficient LLM Generation
Z Chen, R Sadhukhan, Z Ye, Y Zhou, J Zhang, N Nolte, Y Tian, M Douze, ...
ICLR 2024 (Spotlight), 2024
82024
GNNPipe: Scaling Deep GNN Training with Pipelined Model Parallelism
J Chen, Z Chen, X Qian
HPCA2025, 2023
4*2023
Sirius: Contextual Sparsity with Correction for Efficient LLMs
Y Zhou, Z Chen, Z Xu, V Lin, B Chen
NeurIPS 2024, 2024
22024
MINI-SEQUENCE TRANSFORMER: Optimizing Intermediate Memory for Long Sequences Training
C Luo, J Zhao, Z Chen, B Chen, A Anandkumar
NeurIPS 2024, 2024
12024
Quark: A Gradient-Free Quantum Learning Framework for Classification Tasks
Z Zhang, Z Chen, H Huang, Z Jia
12022
GSM-Infinite: How Do Your LLMs Behave over Infinitely Increasing Context Length and Reasoning Complexity?
Y Zhou, H Liu, Z Chen, Y Tian, B Chen
arXiv preprint arXiv:2502.05252, 2025
2025
AdaServe: SLO-Customized LLM Serving with Fine-Grained Speculative Decoding
Z Li, Z Chen, R Delacourt, G Oliaro, Z Wang, Q Chen, S Lin, A Yang, ...
arXiv preprint arXiv:2501.12162, 2025
2025
En aquests moments el sistema no pot dur a terme l'operació. Torneu-ho a provar més tard.
Articles 1–13