The deep learning compiler: A comprehensive survey M Li, Y Liu, X Liu, Q Sun, X You, H Yang, Z Luan, L Gan, G Yang, D Qian IEEE Transactions on Parallel and Distributed Systems 32 (3), 708-727, 2020 | 255 | 2020 |
Toward accelerated stencil computation by adapting tensor core unit on gpu X Liu, Y Liu, H Yang, J Liao, M Li, Z Luan, D Qian Proceedings of the 36th ACM International Conference on Supercomputing, 1-12, 2022 | 21 | 2022 |
Automatic code generation and optimization of large-scale stencil computation on many-core processors M Li, Y Liu, H Yang, Y Hu, Q Sun, B Chen, X You, X Liu, Z Luan, D Qian Proceedings of the 50th International Conference on Parallel Processing, 1-12, 2021 | 20 | 2021 |
EasyScale: Elastic Training with Consistent Accuracy and Improved Utilization on GPUs M Li, W Xiao, H Yang, B Sun, H Zhao, S Ren, Z Luan, X Jia, Y Liu, Y Li, ... Proceedings of the International Conference for High Performance Computing …, 2023 | 16* | 2023 |
Cognn: efficient scheduling for concurrent gnn training on gpus Q Sun, Y Liu, H Yang, R Zhang, M Dun, M Li, X Liu, W Xiao, Y Li, Z Luan, ... SC22: International Conference for High Performance Computing, Networking …, 2022 | 13 | 2022 |
Multi-role sptrsv on sunway many-core architecture M Li, Y Liu, H Yang, Z Luan, D Qian 2018 IEEE 20th International Conference on High Performance Computing and …, 2018 | 13 | 2018 |
Accelerating sparse cholesky factorization on sunway manycore architecture M Li, Y Liu, H Yang, Z Luan, L Gan, G Yang, D Qian IEEE Transactions on Parallel and Distributed Systems 31 (7), 1636-1650, 2019 | 12 | 2019 |
swTVM: exploring the automated compilation for deep learning on sunway architecture C Liu, H Yang, R Sun, Z Luan, D Qian arXiv preprint arXiv:1904.07404, 2019 | 12 | 2019 |
swmr: A framework for accelerating mapreduce applications on sunway taihulight X Zhong, M Li, H Yang, Y Liu, D Qian IEEE Transactions on Emerging Topics in Computing 9 (2), 1020-1030, 2018 | 9 | 2018 |
QoS-aware dynamic resource allocation with improved utilization and energy efficiency on GPU Q Sun, L Yi, H Yang, M Li, Z Luan, D Qian Parallel Computing 113, 102958, 2022 | 5 | 2022 |
Adapting combined tiling to stencil optimizations on sunway processor B Sun, M Li, H Yang, J Xu, Z Luan, D Qian CCF Transactions on High Performance Computing 5 (3), 322-333, 2023 | 4 | 2023 |
Exploiting Input Tensor Dynamics in Activation Checkpointing for Efficient Training on GPU J Liao, M Li, H Yang, Q Sun, B Sun, J Hao, T Feng, F Yu, S Chen, Y Tao, ... 2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS …, 2023 | 4* | 2023 |
Accelerating Large-Scale Sparse LU Factorization for RF Circuit Simulation G Feng, H Wang, Z Guo, M Li, T Zhao, Z Jin, W Jia, G Tan, N Sun European Conference on Parallel Processing, 182-195, 2024 | 3 | 2024 |
Exploiting Subgraph Similarities for Efficient Auto-tuning of Tensor Programs M Li, H Yang, S Zhang, F Yu, R Gong, Y Liu, Z Luan, D Qian Proceedings of the 52nd International Conference on Parallel Processing, 786-796, 2023 | 3* | 2023 |
Pripro: towards effective privacy protection on edge-cloud system running dnn inference R Gao, H Yang, S Huang, M Dun, M Li, Z Luan, Z Luan, D Qian 2021 IEEE/ACM 21st International Symposium on Cluster, Cloud and Internet …, 2021 | 2 | 2021 |
Block-Checksum-Based fault tolerance for matrix multiplication on large-scale parallel systems Y Zhu, Y Liu, M Li, D Qian 2018 IEEE 20th International Conference on High Performance Computing and …, 2018 | 2 | 2018 |
Towards optimized tensor code generation for deep learning on sunway many-core processor M Li, C Liu, J Liao, X Zheng, H Yang, R Sun, J Xu, L Gan, G Yang, Z Luan, ... Frontiers of Computer Science 18 (2), 182101, 2024 | 1 | 2024 |
Building a domain-specific compiler for emerging processors with a reusable approach M Li, Y Liu, B Chen, H Yang, Z Luan, D Qian Science China Information Sciences 67 (1), 112101, 2024 | 1 | 2024 |
ElasticBatch: A Learning-Augmented Elastic Scheduling System for Batch Inference on MIG J Qi, W Xiao, M Li, C Yang, Y Li, W Lin, H Yang, Z Luan, D Qian IEEE Transactions on Parallel and Distributed Systems, 2024 | | 2024 |
swRodinia: A Benchmark Suite for Exploiting Architecture Properties of Sunway Processor B Chen, M Li, H Yang, Z Luan, L Gan, G Yang, D Qian Benchmarking, Measuring, and Optimizing: Third BenchCouncil International …, 2021 | | 2021 |