Obserwuj
Guangxuan Xiao
Guangxuan Xiao
Ph.D. candidate, MIT
Zweryfikowany adres z mit.edu - Strona główna
Tytuł
Cytowane przez
Cytowane przez
Rok
SmoothQuant: Accurate and efficient post-training quantization for large language models
G Xiao, J Lin, M Seznec, H Wu, J Demouth, S Han
International Conference on Machine Learning, 38087-38099, 2023
8122023
AWQ: Activation-aware Weight Quantization for On-Device LLM Compression and Acceleration
J Lin, J Tang, H Tang, S Yang, WM Chen, WC Wang, G Xiao, X Dang, ...
Proceedings of Machine Learning and Systems 6, 87-100, 2024
6292024
Efficient streaming language models with attention sinks
G Xiao, Y Tian, B Chen, S Han, M Lewis
International Conference on Learning Representations (ICLR), 2024
4092024
Fastcomposer: Tuning-free multi-subject image generation with localized attention
G Xiao, T Yin, WT Freeman, F Durand, S Han
International Journal of Computer Vision, 1-20, 2024
1712024
Red alarm for pre-trained models: Universal vulnerability to neuron-level backdoor attacks
Z Zhang, G Xiao, Y Li, T Lv, F Qi, Z Liu, Y Wang, X Jiang, M Sun
Machine Intelligence Research 20 (2), 180-193, 2023
932023
Offsite-tuning: Transfer learning without full model
G Xiao, J Lin, S Han
arXiv preprint arXiv:2302.04870, 2023
662023
Qserve: W4a8kv4 quantization and system co-design for efficient llm serving
Y Lin, H Tang, S Yang, Z Zhang, G Xiao, C Gan, S Han
arXiv preprint arXiv:2405.04532, 2024
432024
Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference
J Tang, Y Zhao, K Zhu, G Xiao, B Kasikci, S Han
ICML 2024, 2024
312024
Retrieval head mechanistically explains long-context factuality
W Wu, Y Wang, G Xiao, H Peng, Y Fu
International Conference on Learning Representations (ICLR), 2025
282025
Infllm: Unveiling the intrinsic capacity of llms for understanding extremely long sequences with training-free memory
C Xiao, P Zhang, X Han, G Xiao, Y Lin, Z Zhang, Z Liu, S Han, M Sun
NeurIPS 2024, 2024
252024
Bitdelta: Your fine-tune may only be worth one bit
J Liu, G Xiao, K Li, JD Lee, S Han, T Dao, T Cai
NeurIPS 2024, 2024
162024
Duoattention: Efficient long-context llm inference with retrieval and streaming heads
G Xiao, J Tang, J Zuo, J Guo, S Yang, H Tang, Y Fu, S Han
International Conference on Learning Representations (ICLR), 2025
82025
ReFresh: Reducing memory access from exploiting stable historical embeddings for graph neural network training
K Huang, H Jiang, M Wang, G Xiao, D Wipf, X Song, Q Gan, Z Huang, ...
arXiv preprint arXiv:2301.07482, 2023
72023
FreshGNN: Reducing Memory Access via Stable Historical Embeddings for Graph Neural Network Training
K Huang, H Jiang, M Wang, G Xiao, D Wipf, X Song, Q Gan, Z Huang, ...
Proceedings of the VLDB Endowment 17 (6), 1473-1486, 2024
42024
AWQ: Activation-aware Weight Quantization for On-Device LLM Compression and Acceleration
J Lin, J Tang, H Tang, S Yang, G Xiao, S Han
GetMobile: Mobile Computing and Communications 28 (4), 12-17, 2025
2025
Efficient Deployment Algorithms for Large Language Models
G Xiao
Massachusetts Institute of Technology, 2024
2024
Sparse and Local Networks for Hypergraph Reasoning
G Xiao, LP Kaelbling, J Wu, J Mao
Learning on Graphs Conference, 34: 1-34: 16, 2022
2022
Nie można teraz wykonać tej operacji. Spróbuj ponownie później.
Prace 1–17