Theo dõi
Shiyao Li (李师尧)
Shiyao Li (李师尧)
Ph.D student, Tsinghua University
Email được xác minh tại mails.tsinghua.edu.cn - Trang chủ
Tiêu đề
Trích dẫn bởi
Trích dẫn bởi
Năm
A survey on efficient inference for large language models
Z Zhou, X Ning, K Hong, T Fu, J Xu, S Li, Y Lou, L Wang, Z Yuan, X Li, ...
arXiv preprint arXiv:2404.14294, 2024
772024
Evaluating quantized large language models
S Li, X Ning, L Wang, T Liu, X Shi, S Yan, G Dai, H Yang, Y Wang
Forty-first International Conference on Machine Learning, 2024
492024
Flightllm: Efficient large language model inference with a complete mapping flow on fpgas
S Zeng, J Liu, G Dai, X Yang, T Fu, H Wang, W Ma, H Sun, S Li, Z Huang, ...
Proceedings of the 2024 ACM/SIGDA International Symposium on Field …, 2024
442024
Lv-eval: A balanced long-context benchmark with 5 length levels up to 256k
T Yuan, X Ning, D Zhou, Z Yang, S Li, M Zhuang, Z Tan, Z Yao, D Lin, B Li, ...
arXiv preprint arXiv:2402.05136, 2024
172024
LLM-MQ: Mixed-precision Quantization for Efficient LLM Deployment
S Li, X Ning, K Hong, T Liu, L Wang, X Li, K Zhong, G Dai, H Yang, ...
NeurIPS 2023 Efficient Natural Language and Speech Processing Workshop, 2023, 0
16*
Moa: Mixture of sparse attention for automatic large language model compression
T Fu, H Huang, X Ning, G Zhang, B Chen, T Wu, H Wang, Z Huang, S Li, ...
arXiv preprint arXiv:2406.14909, 2024
122024
Vidit-q: Efficient and accurate quantization of diffusion transformers for image and video generation
T Zhao, T Fang, E Liu, R Wan, W Soedarmadji, S Li, Z Lin, G Dai, S Yan, ...
arXiv preprint arXiv:2406.02540, 2024
92024
A unified FPGA virtualization framework for general-purpose deep neural networks in the cloud
S Zeng, G Dai, H Sun, J Liu, S Li, G Ge, K Zhong, K Guo, Y Wang, H Yang
ACM Transactions on Reconfigurable Technology and Systems (TRETS) 15 (3), 1-31, 2021
52021
Towards high-accuracy and real-time two-stage small object detection on FPGA
S Li, Z Zhu, H Sun, X Ning, G Dai, Y Hu, H Yang, Y Wang
IEEE Transactions on Circuits and Systems for Video Technology, 2024
42024
Can LLMs learn by teaching for better reasoning? A preliminary study
X Ning, Z Wang, S Li, Z Lin, P Yao, T Fu, MB Blaschko, G Dai, H Yang, ...
arXiv preprint arXiv:2406.14629, 2024
22024
Tcp: Triplet contrastive-relationship preserving for class-incremental learning
S Li, X Ning, S Zhang, L Guo, T Zhao, H Yang, Y Wang
Proceedings of the IEEE/CVF Winter Conference on Applications of Computer …, 2024
22024
Enabling fast 2-bit llm on gpus: Memory alignment, sparse outlier, and asynchronous dequantization
J Li, S Li, J Xu, S Huang, Y Lian, J Liu, Y Wang, G Dai
arXiv preprint arXiv:2311.16442, 2023
22023
Memory-efficient and real-time SPAD-based dToF depth sensor with spatial and statistical correlation
S Li, Z Zhu, Y Zhu, Q Zhu, J Zhang, W Sun, G Dai, F Qiao, H Yang, ...
2023 60th ACM/IEEE Design Automation Conference (DAC), 1-6, 2023
12023
MBQ: Modality-Balanced Quantization for Large Vision-Language Models
S Li, Y Hu, X Ning, X Liu, K Hong, X Jia, X Li, Y Yan, P Ran, G Dai, S Yan, ...
arXiv preprint arXiv:2412.19509, 2024
2024
CSKV: Training-Efficient Channel Shrinking for KV Cache in Long-Context Scenarios
L Wang, S Li, X Ning, Z Yuan, S Yan, G Dai, Y Wang
arXiv preprint arXiv:2409.10593, 2024
2024
Towards Floating Point-Based Attention-Free LLM: Hybrid PIM with Non-Uniform Data Format and Reduced Multiplications
L Guo, Z Zhu, T Liu, X Ning, S Li, G Dai, H Yang, W Fu, Y Wang
2024
Hệ thống không thể thực hiện thao tác ngay bây giờ. Hãy thử lại sau.
Bài viết 1–16