Xiuhong Li

Citat de

	Toate	Din 2020
Referințe bibliografice	676	553
h-index	14	12
i10-index	18	13

260

130

195

201620172018201920202021202220232024202512 20 35 53 50 54 72 68 245 62

Acces public

Afișați-le pe toate

16 articole

0 articole

disponibile

indisponibile

Pe baza cerințelor privind finanțarea

Coautori

Yun (Eric) LiangProfessor of EECS, Peking University, ACM Distinguished ScientistAdresă de e-mail confirmată pe pku.edu.cn
Shengen YanDepartment of Electronic Engineering, Tsinghua University, ChinaAdresă de e-mail confirmată pe tsinghua.edu.cn
Xiaolong XieResearch Engineer, Damo Academy, Alibaba Group.Adresă de e-mail confirmată pe alibaba-inc.com
Jiangfei DuanThe Chinese University of Hong KongAdresă de e-mail confirmată pe ie.cuhk.edu.hk
Xingcheng ZHANGShanghai AI labAdresă de e-mail confirmată pe sensetime.com
Size ZhengPeking UniversityAdresă de e-mail confirmată pe pku.edu.cn
Renze ChenPeking UniversityAdresă de e-mail confirmată pe pku.edu.cn
Ke HongTsinghua UniversityAdresă de e-mail confirmată pe mails.tsinghua.edu.cn
Guohao Dai（戴国浩）Associate Professor of Shanghai Jiao Tong UniversityAdresă de e-mail confirmată pe sjtu.edu.cn
Zhihang YuanInfini-AIAdresă de e-mail confirmată pe infini-ai.com
Xuechao WeiHYGONAdresă de e-mail confirmată pe pku.edu.cn

Urmăriți

Xiuhong Li

Peking University

Adresă de e-mail confirmată pe pku.edu.cn

Deep Learning System GPGPU Deep Learning Compiler


Titlu Sortați după descrierea bibliografică Sortați după an Sortați după titlu	Citat de Citat de	Anul
TGPA: Tile-grained pipeline architecture for low latency CNN inference X Wei, Y Liang, X Li, CH Yu, P Zhang, J Cong 2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), 1-8, 2018	86	2018
Enabling coordinated register allocation and thread-level parallelism optimization for GPUs X Xie, Y Liang, X Li, Y Wu, G Sun, T Wang, D Fan Proceedings of the 48th International Symposium on Microarchitecture, 395-406, 2015	85	2015
A survey on efficient inference for large language models Z Zhou, X Ning, K Hong, T Fu, J Xu, S Li, Y Lou, L Wang, Z Yuan, X Li, ... arXiv preprint arXiv:2404.14294, 2024	81	2024
A coordinated tiling and batching framework for efficient GEMM on GPUs X Li, Y Liang, S Yan, L Jia, Y Li Proceedings of the 24th symposium on principles and practice of parallel …, 2019	70	2019
AMOS: enabling automatic mapping for tensor computations on spatial accelerators with hardware abstraction S Zheng, R Chen, A Wei, Y Jin, Q Han, L Lu, B Wu, X Li, S Yan, Y Liang Proceedings of the 49th Annual International Symposium on Computer …, 2022	59	2022
Flashdecoding++: Faster large language model inference on gpus K Hong, G Dai, J Xu, Q Mao, X Li, J Liu, K Chen, Y Dong, Y Wang arXiv preprint arXiv:2311.01282, 2023	50	2023
Chimera: An analytical optimizing framework for effective compute-intensive operators fusion S Zheng, S Chen, P Song, R Chen, X Li, S Yan, D Lin, J Leng, Y Liang 2023 IEEE International Symposium on High-Performance Computer Architecture …, 2023	29	2023
Enabling efficient fast convolution algorithms on GPUs via MegaKernels L Jia, Y Liang, X Li, L Lu, S Yan IEEE Transactions on Computers 69 (7), 986-997, 2020	25	2020
Performance-centric register file design for GPUs using racetrack memory S Wang, Y Liang, C Zhang, X Xie, G Sun, Y Liu, Y Wang, X Li 2016 21st Asia and South Pacific Design Automation Conference (ASP-DAC), 25-30, 2016	20	2016
Skvq: Sliding-window key and value cache quantization for large language models H Duanmu, Z Yuan, X Li, J Duan, X Zhang, D Lin arXiv preprint arXiv:2405.06219, 2024	18	2024
CRAT: Enabling coordinated register allocation and thread-level parallelism optimization for GPUs X Xie, Y Liang, X Li, Y Wu, G Sun, T Wang, D Fan IEEE Transactions on Computers 67 (6), 890-897, 2017	17	2017
Efficient kernel management on GPUs X Li, Y Liang 2016 Design, Automation & Test in Europe Conference & Exhibition (DATE), 85-90, 2016	16	2016
Centauri: Enabling efficient scheduling for communication-computation overlap in large model training via communication partitioning C Chen, X Li, Q Zhu, J Duan, P Sun, X Zhang, C Yang Proceedings of the 29th ACM International Conference on Architectural …, 2024	15	2024
cuMBIR: An efficient framework for low-dose X-ray CT image reconstruction on GPUs X Li, Y Liang, W Zhang, T Liu, H Li, G Luo, M Jiang Proceedings of the 2018 International Conference on Supercomputing, 184-194, 2018	14	2018
Neoflow: A flexible framework for enabling efficient compilation for high performance dnn training S Zheng, R Chen, Y Jin, A Wei, B Wu, X Li, S Yan, Y Liang IEEE Transactions on Parallel and Distributed Systems 33 (11), 3220-3232, 2021	13	2021
Exploring cache bypassing and partitioning for multi-tasking on GPUs Y Liang, X Li, X Xie 2017 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), 9-16, 2017	12	2017
Flashdecoding++: Faster large language model inference with asynchronization, flat gemm optimization, and heuristics K Hong, G Dai, J Xu, Q Mao, X Li, J Liu, Y Dong, Y Wang Proceedings of Machine Learning and Systems 6, 148-161, 2024	11	2024
Efficient kernel management on GPUs Y Liang, X Li ACM Transactions on Embedded Computing Systems (TECS) 16 (4), 1-24, 2017	10	2017
SampleAttention: Near-Lossless Acceleration of Long Context LLM Inference with Adaptive Structured Sparse Attention Q Zhu, J Duan, C Chen, S Liu, X Li, G Feng, X Lv, H Cao, X Chuanfu, ... arXiv preprint arXiv:2406.15486, 2024	8	2024
CuLDA: solving large-scale LDA Problems on GPUs X Xie, Y Liang, X Li, W Tan Proceedings of the 28th International Symposium on High-Performance Parallel …, 2019	8	2019

Sistemul nu poate realiza operația în acest moment. Încercați din nou mai târziu.

Articole 1–20

Referințe bibliografice pe an

Citate duplicat

Citate fuzionate

Adăugați coautoriCoautori

Urmăriți

Citat de

Coautori