ติดตาม
Guangxuan Xiao
Guangxuan Xiao
Ph.D. candidate, MIT
ยืนยันอีเมลแล้วที่ mit.edu - หน้าแรก
ชื่อ
อ้างโดย
อ้างโดย
ปี
SmoothQuant: Accurate and efficient post-training quantization for large language models
G Xiao, J Lin, M Seznec, H Wu, J Demouth, S Han
International Conference on Machine Learning, 38087-38099, 2023
8512023
Awq: Activation-aware weight quantization for on-device llm compression and acceleration
J Lin, J Tang, H Tang, S Yang, WM Chen, WC Wang, G Xiao, X Dang, ...
Proceedings of Machine Learning and Systems 6, 87-100, 2024
6872024
Efficient streaming language models with attention sinks
G Xiao, Y Tian, B Chen, S Han, M Lewis
International Conference on Learning Representations (ICLR), 2024
4792024
Fastcomposer: Tuning-free multi-subject image generation with localized attention
G Xiao, T Yin, WT Freeman, F Durand, S Han
International Journal of Computer Vision, 1-20, 2024
1812024
Red alarm for pre-trained models: Universal vulnerability to neuron-level backdoor attacks
Z Zhang, G Xiao, Y Li, T Lv, F Qi, Z Liu, Y Wang, X Jiang, M Sun
Machine Intelligence Research 20 (2), 180-193, 2023
982023
Offsite-tuning: Transfer learning without full model
G Xiao, J Lin, S Han
arXiv preprint arXiv:2302.04870, 2023
682023
Qserve: W4a8kv4 quantization and system co-design for efficient llm serving
Y Lin, H Tang, S Yang, Z Zhang, G Xiao, C Gan, S Han
arXiv preprint arXiv:2405.04532, 2024
482024
Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference
J Tang, Y Zhao, K Zhu, G Xiao, B Kasikci, S Han
ICML 2024, 2024
472024
Retrieval head mechanistically explains long-context factuality
W Wu, Y Wang, G Xiao, H Peng, Y Fu
International Conference on Learning Representations (ICLR), 2025
382025
Infllm: Unveiling the intrinsic capacity of llms for understanding extremely long sequences with training-free memory
C Xiao, P Zhang, X Han, G Xiao, Y Lin, Z Zhang, Z Liu, S Han, M Sun
NeurIPS 2024, 2024
342024
Duoattention: Efficient long-context llm inference with retrieval and streaming heads
G Xiao, J Tang, J Zuo, J Guo, S Yang, H Tang, Y Fu, S Han
International Conference on Learning Representations (ICLR), 2025
182025
Bitdelta: Your fine-tune may only be worth one bit
J Liu, G Xiao, K Li, JD Lee, S Han, T Dao, T Cai
NeurIPS 2024, 2024
162024
AWQ: Activation-aware Weight Quantization for On-Device LLM Compression and Acceleration
J Lin, J Tang, H Tang, S Yang, G Xiao, S Han
GetMobile: Mobile Computing and Communications 28 (4), 12-17, 2025
82025
ReFresh: Reducing memory access from exploiting stable historical embeddings for graph neural network training
K Huang, H Jiang, M Wang, G Xiao, D Wipf, X Song, Q Gan, Z Huang, ...
arXiv e-prints, arXiv: 2301.07482, 2023
72023
FreshGNN: reducing memory access via stable historical embeddings for graph neural network training
K Huang, H Jiang, M Wang, G Xiao, D Wipf, X Song, Q Gan, Z Huang, ...
arXiv preprint arXiv:2301.07482, 2023
62023
LServe: Efficient Long-sequence LLM Serving with Unified Sparse Attention
S Yang, J Guo, H Tang, Q Hu, G Xiao, J Tang, Y Lin, Z Liu, Y Lu, S Han
arXiv preprint arXiv:2502.14866, 2025
2025
Efficient Deployment Algorithms for Large Language Models
G Xiao
Massachusetts Institute of Technology, 2024
2024
ระบบไม่สามารถดำเนินการได้ในขณะนี้ โปรดลองใหม่อีกครั้งในภายหลัง
บทความ 1–17