SmoothQuant: Accurate and efficient post-training quantization for large language models G Xiao, J Lin, M Seznec, H Wu, J Demouth, S Han International Conference on Machine Learning, 38087-38099, 2023 | 812 | 2023 |
AWQ: Activation-aware Weight Quantization for On-Device LLM Compression and Acceleration J Lin, J Tang, H Tang, S Yang, WM Chen, WC Wang, G Xiao, X Dang, ... Proceedings of Machine Learning and Systems 6, 87-100, 2024 | 629 | 2024 |
Efficient streaming language models with attention sinks G Xiao, Y Tian, B Chen, S Han, M Lewis International Conference on Learning Representations (ICLR), 2024 | 409 | 2024 |
Fastcomposer: Tuning-free multi-subject image generation with localized attention G Xiao, T Yin, WT Freeman, F Durand, S Han International Journal of Computer Vision, 1-20, 2024 | 171 | 2024 |
Red alarm for pre-trained models: Universal vulnerability to neuron-level backdoor attacks Z Zhang, G Xiao, Y Li, T Lv, F Qi, Z Liu, Y Wang, X Jiang, M Sun Machine Intelligence Research 20 (2), 180-193, 2023 | 93 | 2023 |
Offsite-tuning: Transfer learning without full model G Xiao, J Lin, S Han arXiv preprint arXiv:2302.04870, 2023 | 66 | 2023 |
Qserve: W4a8kv4 quantization and system co-design for efficient llm serving Y Lin, H Tang, S Yang, Z Zhang, G Xiao, C Gan, S Han arXiv preprint arXiv:2405.04532, 2024 | 43 | 2024 |
Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference J Tang, Y Zhao, K Zhu, G Xiao, B Kasikci, S Han ICML 2024, 2024 | 31 | 2024 |
Retrieval head mechanistically explains long-context factuality W Wu, Y Wang, G Xiao, H Peng, Y Fu International Conference on Learning Representations (ICLR), 2025 | 28 | 2025 |
Infllm: Unveiling the intrinsic capacity of llms for understanding extremely long sequences with training-free memory C Xiao, P Zhang, X Han, G Xiao, Y Lin, Z Zhang, Z Liu, S Han, M Sun NeurIPS 2024, 2024 | 25 | 2024 |
Bitdelta: Your fine-tune may only be worth one bit J Liu, G Xiao, K Li, JD Lee, S Han, T Dao, T Cai NeurIPS 2024, 2024 | 16 | 2024 |
Duoattention: Efficient long-context llm inference with retrieval and streaming heads G Xiao, J Tang, J Zuo, J Guo, S Yang, H Tang, Y Fu, S Han International Conference on Learning Representations (ICLR), 2025 | 8 | 2025 |
ReFresh: Reducing memory access from exploiting stable historical embeddings for graph neural network training K Huang, H Jiang, M Wang, G Xiao, D Wipf, X Song, Q Gan, Z Huang, ... arXiv preprint arXiv:2301.07482, 2023 | 7 | 2023 |
FreshGNN: Reducing Memory Access via Stable Historical Embeddings for Graph Neural Network Training K Huang, H Jiang, M Wang, G Xiao, D Wipf, X Song, Q Gan, Z Huang, ... Proceedings of the VLDB Endowment 17 (6), 1473-1486, 2024 | 4 | 2024 |
AWQ: Activation-aware Weight Quantization for On-Device LLM Compression and Acceleration J Lin, J Tang, H Tang, S Yang, G Xiao, S Han GetMobile: Mobile Computing and Communications 28 (4), 12-17, 2025 | | 2025 |
Efficient Deployment Algorithms for Large Language Models G Xiao Massachusetts Institute of Technology, 2024 | | 2024 |
Sparse and Local Networks for Hypergraph Reasoning G Xiao, LP Kaelbling, J Wu, J Mao Learning on Graphs Conference, 34: 1-34: 16, 2022 | | 2022 |