Ansor: Generating {High-Performance} tensor programs for deep learning L Zheng, C Jia, M Sun, Z Wu, CH Yu, A Haj-Ali, Y Wang, J Yang, D Zhuo, ... 14th USENIX symposium on operating systems design and implementation (OSDI …, 2020 | 431 | 2020 |
DAPPLE: A pipelined data parallel approach for training large models S Fan, Y Rong, C Meng, Z Cao, S Wang, Z Zheng, C Wu, G Long, J Yang, ... Proceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of …, 2021 | 238 | 2021 |
Training deeper models by GPU memory optimization on TensorFlow C Meng, M Sun, J Yang, M Qiu, Y Gu Proc. of ML Systems Workshop in NIPS 7, 2017 | 104 | 2017 |
AStitch: enabling a new multi-dimensional optimization space for memory-intensive ML training and inference on modern SIMT architectures Z Zheng, X Yang, P Zhao, G Long, K Zhu, F Zhu, W Zhao, X Liu, J Yang, ... Proceedings of the 27th ACM International Conference on Architectural …, 2022 | 67 | 2022 |
Characterizing deep learning training workloads on alibaba-pai M Wang, C Meng, G Long, C Wu, J Yang, W Lin, Y Jia 2019 IEEE international symposium on workload characterization (IISWC), 189-202, 2019 | 64 | 2019 |
Pyramid Embedded Generative Adversarial Network for Automated Font Generation D Sun, Q Zhang, J Yang 2018 24th International Conference on Pattern Recognition (ICPR), 976-981, 2018 | 43 | 2018 |
CrashTuner: detecting crash-recovery bugs in cloud systems via meta-info analysis J Lu, C Liu, L Li, X Feng, F Tan, J Yang, L You Proceedings of the 27th ACM Symposium on Operating Systems Principles, 114-130, 2019 | 37 | 2019 |
FusionStitching: Boosting Memory Intensive Computations for Deep Learning Workloads Z Zheng, P Zhao, G Long, F Zhu, K Zhu, W Zhao, L Diao, J Yang, W Lin arXiv preprint arXiv:2009.10924, 2020 | 33 | 2020 |
DISC: A dynamic shape compiler for machine learning workloads K Zhu, WY Zhao, Z Zheng, TY Guo, PZ Zhao, JJ Bai, J Yang, XY Liu, ... Proceedings of the 1st Workshop on Machine Learning and Systems, 89-95, 2021 | 31 | 2021 |
Optimizing Distributed Training Deployment in Heterogeneous GPU Clusters WL X. Yi, S. Zhang, Z. Luo, G. Long, L. Diao, C. Wu, Z. Zheng, J. Yang ACM CoNEXT, 2020 | 31* | 2020 |
You Only Compress Once: Towards Effective and Elastic BERT Compression via Exploit-Explore Stochastic Nature Gradient S Zhang, X Zheng, C Yang, Y Li, Y Wang, F Chao, M Wang, S Li, J Yang, ... arXiv preprint arXiv:2106.02435, 2021 | 26 | 2021 |
Detecting TensorFlow program bugs in real-world industrial environment C Liu, J Lu, G Li, T Yuan, L Li, F Tan, J Yang, L You, J Xue 2021 36th IEEE/ACM International Conference on Automated Software …, 2021 | 19 | 2021 |
Efficient Deep Learning Inference based on Model Compression Q Zhang, M Zhang, M Wang, W Sui, C Meng, J Yang, W Kong, X Cui, ... Proceedings of the IEEE Conference on Computer Vision and Pattern …, 2018 | 17 | 2018 |
INT8 Winograd Acceleration for Conv1D Equipped ASR Models Deployed on Mobile Devices Y Yao, Y Li, C Wang, T Yu, H Chen, X Jiang, J Yang, J Huang, W Lin, ... arXiv preprint arXiv:2010.14841, 2020 | 11 | 2020 |
FusionStitching: Deep Fusion and Code Generation for Tensorflow Computations on GPUs G Long, J Yang, K Zhu, W Lin arXiv preprint arXiv:1811.05213, 2018 | 11 | 2018 |
A Novel Integrated Framework for Learning both Text Detection and Recognition W Sui, Q Zhang, J Yang, W Chu 2018 24th International Conference on Pattern Recognition (ICPR), 2233-2238, 2018 | 10 | 2018 |
Auto-MAP: A DQN Framework for Exploring Distributed Execution Plans for DNN Workloads S Wang, Y Rong, S Fan, Z Zheng, LS Diao, G Long, J Yang, X Liu, W Lin arXiv preprint arXiv:2007.04069, 2020 | 8 | 2020 |
Graph-Adaptive Pruning for Efficient Inference of Convolutional Neural Networks M Wang, Q Zhang, J Yang, X Cui, W Lin arXiv preprint arXiv:1811.08589, 2018 | 7* | 2018 |
HyperGef: A Framework Enabling Efficient Fusion for Hypergraph Neural Network on GPUs YW Zhongming Yu, Guohao Dai, Shang Yang, Genghan Zhang, Hengrui Zhang ... MLSys, 2023 | 5* | 2023 |
ScaleFold: Reducing AlphaFold Initial Training Time to 10 Hours F Zhu, A Nowaczynski, R Li, J Xin, Y Song, M Marcinkiewicz, SB Eryilmaz, ... arXiv preprint arXiv:2404.11068, 2024 | 2 | 2024 |