Efficient and effective sparse LSTM on FPGA with bank-balanced sparsity S Cao, C Zhang, Z Yao, W Xiao, L Nie, D Zhan, Y Liu, M Wu, L Zhang Proceedings of the 2019 ACM/SIGDA International Symposium on Field …, 2019 | 213 | 2019 |
Balanced sparsity for efficient dnn inference on gpu Z Yao, S Cao, W Xiao, C Zhang, L Nie Proceedings of the AAAI conference on artificial intelligence 33 (01), 5676-5683, 2019 | 131 | 2019 |
Seernet: Predicting convolutional neural network feature-map sparsity through low-bit quantization S Cao, L Ma, W Xiao, C Zhang, Y Liu, L Zhang, L Nie, Z Yang Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2019 | 96 | 2019 |
Evomoe: An evolutional mixture-of-experts training framework via dense-to-sparse gate X Nie, X Miao, S Cao, L Ma, Q Liu, J Xue, Y Miao, Y Liu, Z Yang, B Cui arXiv preprint arXiv:2112.14397, 2021 | 31 | 2021 |
Dense-to-sparse gate for mixture-of-experts X Nie, S Cao, X Miao, L Ma, J Xue, Y Miao, Z Yang, Z Yang, CUI Bin | 27 | 2021 |
Integer or floating point? new outlooks for low-bit quantization on large language models Y Zhang, L Zhao, S Cao, S Zhang, W Wang, T Cao, F Yang, M Yang, ... 2024 IEEE International Conference on Multimedia and Expo (ICME), 1-6, 2024 | 25 | 2024 |
Pre-gated moe: An algorithm-system co-design for fast and scalable mixture-of-expert inference R Hwang, J Wei, S Cao, C Hwang, X Tang, T Cao, M Yang 2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture …, 2024 | 25 | 2024 |
Bitdistiller: Unleashing the potential of sub-4-bit llms via self-distillation D Du, Y Zhang, S Cao, J Guo, T Cao, X Chu, N Xu arXiv preprint arXiv:2402.10631, 2024 | 12 | 2024 |
Nn-stretch: Automatic neural network branching for parallel inference on heterogeneous multi-processors J Wei, T Cao, S Cao, S Jiang, S Fu, M Yang, Y Zhang, Y Liu Proceedings of the 21st Annual International Conference on Mobile Systems …, 2023 | 10 | 2023 |
Accurate and structured pruning for efficient automatic speech recognition H Jiang, LL Zhang, Y Li, Y Wu, S Cao, T Cao, Y Yang, J Li, M Yang, L Qiu arXiv preprint arXiv:2305.19549, 2023 | 9 | 2023 |
Ladder: Enabling Efficient {Low-Precision} Deep Learning Computing through Hardware-aware Tensor Transformation L Wang, L Ma, S Cao, Q Zhang, J Xue, Y Shi, N Zheng, Z Miao, F Yang, ... 18th USENIX Symposium on Operating Systems Design and Implementation (OSDI …, 2024 | 8 | 2024 |
Efficient gpu kernels for n: m-sparse weights in deep learning B Lin, N Zheng, L Wang, S Cao, L Ma, Q Zhang, Y Zhu, T Cao, J Xue, ... Proceedings of Machine Learning and Systems 5, 513-525, 2023 | 8 | 2023 |
T-mac: Cpu renaissance via table lookup for low-bit llm deployment on edge J Wei, S Cao, T Cao, L Ma, L Wang, Y Zhang, M Yang arXiv preprint arXiv:2407.00088, 2024 | 5 | 2024 |
Afpq: Asymmetric floating point quantization for llms Y Zhang, S Zhang, S Cao, D Du, J Wei, T Cao, N Xu arXiv preprint arXiv:2311.01792, 2023 | 4 | 2023 |
Adam Accumulation to Reduce Memory Footprints of both Activations and Gradients for Large-scale DNN Training Y Zhang, Y Han, S Cao, G Dai, Y Miao, T Cao, F Yang, N Xu arXiv preprint arXiv:2305.19982, 2023 | 3 | 2023 |
Seerattention: Learning intrinsic sparse attention in your llms Y Gao, Z Zeng, D Du, S Cao, HKH So, T Cao, F Yang, M Yang arXiv preprint arXiv:2410.13276, 2024 | 2 | 2024 |
Lut tensor core: Lookup table enables efficient low-bit llm inference acceleration Z Mo, L Wang, J Wei, Z Zeng, S Cao, L Ma, N Jing, T Cao, J Xue, F Yang, ... arXiv preprint arXiv:2408.06003, 2024 | 2 | 2024 |
Dissecting Bit-Level Scaling Laws in Quantizing Vision Generative Models X Ding, S Cao, T Cao, Z Chen arXiv preprint arXiv:2501.06218, 2025 | | 2025 |
Fine-Grained Structured Sparse Computing for FPGA-Based AI Inference C Zhang, S Cao, G Dai, C Geng, Z Yao, W Xiao, Y Liu, M Wu, L Zhang, ... IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2024 | | 2024 |
Automating Energy-Efficient GPU Kernel Generation: A Fast Search-Based Compilation Approach Y Zhang, Z Gou, S Cao, W Feng, S Zhang, G Dai, N Xu arXiv preprint arXiv:2411.18873, 2024 | | 2024 |