Volgen
Shijie Cao
Shijie Cao
Microsoft Research Asia
Geverifieerd e-mailadres voor microsoft.com - Homepage
Titel
Geciteerd door
Geciteerd door
Jaar
Efficient and effective sparse LSTM on FPGA with bank-balanced sparsity
S Cao, C Zhang, Z Yao, W Xiao, L Nie, D Zhan, Y Liu, M Wu, L Zhang
Proceedings of the 2019 ACM/SIGDA International Symposium on Field …, 2019
2132019
Balanced sparsity for efficient dnn inference on gpu
Z Yao, S Cao, W Xiao, C Zhang, L Nie
Proceedings of the AAAI conference on artificial intelligence 33 (01), 5676-5683, 2019
1312019
Seernet: Predicting convolutional neural network feature-map sparsity through low-bit quantization
S Cao, L Ma, W Xiao, C Zhang, Y Liu, L Zhang, L Nie, Z Yang
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2019
962019
Evomoe: An evolutional mixture-of-experts training framework via dense-to-sparse gate
X Nie, X Miao, S Cao, L Ma, Q Liu, J Xue, Y Miao, Y Liu, Z Yang, B Cui
arXiv preprint arXiv:2112.14397, 2021
312021
Dense-to-sparse gate for mixture-of-experts
X Nie, S Cao, X Miao, L Ma, J Xue, Y Miao, Z Yang, Z Yang, CUI Bin
272021
Integer or floating point? new outlooks for low-bit quantization on large language models
Y Zhang, L Zhao, S Cao, S Zhang, W Wang, T Cao, F Yang, M Yang, ...
2024 IEEE International Conference on Multimedia and Expo (ICME), 1-6, 2024
252024
Pre-gated moe: An algorithm-system co-design for fast and scalable mixture-of-expert inference
R Hwang, J Wei, S Cao, C Hwang, X Tang, T Cao, M Yang
2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture …, 2024
252024
Bitdistiller: Unleashing the potential of sub-4-bit llms via self-distillation
D Du, Y Zhang, S Cao, J Guo, T Cao, X Chu, N Xu
arXiv preprint arXiv:2402.10631, 2024
122024
Nn-stretch: Automatic neural network branching for parallel inference on heterogeneous multi-processors
J Wei, T Cao, S Cao, S Jiang, S Fu, M Yang, Y Zhang, Y Liu
Proceedings of the 21st Annual International Conference on Mobile Systems …, 2023
102023
Accurate and structured pruning for efficient automatic speech recognition
H Jiang, LL Zhang, Y Li, Y Wu, S Cao, T Cao, Y Yang, J Li, M Yang, L Qiu
arXiv preprint arXiv:2305.19549, 2023
92023
Ladder: Enabling Efficient {Low-Precision} Deep Learning Computing through Hardware-aware Tensor Transformation
L Wang, L Ma, S Cao, Q Zhang, J Xue, Y Shi, N Zheng, Z Miao, F Yang, ...
18th USENIX Symposium on Operating Systems Design and Implementation (OSDI …, 2024
82024
Efficient gpu kernels for n: m-sparse weights in deep learning
B Lin, N Zheng, L Wang, S Cao, L Ma, Q Zhang, Y Zhu, T Cao, J Xue, ...
Proceedings of Machine Learning and Systems 5, 513-525, 2023
82023
T-mac: Cpu renaissance via table lookup for low-bit llm deployment on edge
J Wei, S Cao, T Cao, L Ma, L Wang, Y Zhang, M Yang
arXiv preprint arXiv:2407.00088, 2024
52024
Afpq: Asymmetric floating point quantization for llms
Y Zhang, S Zhang, S Cao, D Du, J Wei, T Cao, N Xu
arXiv preprint arXiv:2311.01792, 2023
42023
Adam Accumulation to Reduce Memory Footprints of both Activations and Gradients for Large-scale DNN Training
Y Zhang, Y Han, S Cao, G Dai, Y Miao, T Cao, F Yang, N Xu
arXiv preprint arXiv:2305.19982, 2023
32023
Seerattention: Learning intrinsic sparse attention in your llms
Y Gao, Z Zeng, D Du, S Cao, HKH So, T Cao, F Yang, M Yang
arXiv preprint arXiv:2410.13276, 2024
22024
Lut tensor core: Lookup table enables efficient low-bit llm inference acceleration
Z Mo, L Wang, J Wei, Z Zeng, S Cao, L Ma, N Jing, T Cao, J Xue, F Yang, ...
arXiv preprint arXiv:2408.06003, 2024
22024
Dissecting Bit-Level Scaling Laws in Quantizing Vision Generative Models
X Ding, S Cao, T Cao, Z Chen
arXiv preprint arXiv:2501.06218, 2025
2025
Fine-Grained Structured Sparse Computing for FPGA-Based AI Inference
C Zhang, S Cao, G Dai, C Geng, Z Yao, W Xiao, Y Liu, M Wu, L Zhang, ...
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2024
2024
Automating Energy-Efficient GPU Kernel Generation: A Fast Search-Based Compilation Approach
Y Zhang, Z Gou, S Cao, W Feng, S Zhang, G Dai, N Xu
arXiv preprint arXiv:2411.18873, 2024
2024
Het systeem kan de bewerking nu niet uitvoeren. Probeer het later opnieuw.
Artikelen 1–20