Enabling efficient preemption for SIMT architectures with lightweight context switching Z Lin, L Nyland, H Zhou SC'16: Proceedings of the International Conference for High Performance …, 2016 | 52 | 2016 |
Accelerate GPU concurrent kernel execution by mitigating memory pipeline stalls H Dai, Z Lin, C Li, C Zhao, F Wang, N Zheng, H Zhou 2018 IEEE international symposium on high performance computer architecture …, 2018 | 51 | 2018 |
Automatic data placement into GPU on-chip memory resources C Li, Y Yang, Z Lin, H Zhou 2015 IEEE/ACM International Symposium on Code Generation and Optimization …, 2015 | 43 | 2015 |
Implementation and evaluation of deep neural networks (DNN) on mainstream heterogeneous systems J Gu, M Zhu, Z Zhou, F Zhang, Z Lin, Q Zhang, M Breternitz Proceedings of 5th Asia-Pacific Workshop on Systems, 1-7, 2014 | 37 | 2014 |
In-place zero-space memory protection for cnn H Guan, L Ning, Z Lin, X Shen, H Zhou, SH Lim Advances in Neural Information Processing Systems 32, 2019 | 34 | 2019 |
Scatter-and-gather revisited: High-performance side-channel-resistant AES on GPUs Z Lin, U Mathur, H Zhou Proceedings of the 12th Workshop on General Purpose Processing Using GPUs, 2-11, 2019 | 20 | 2019 |
Selectively GPU cache bypassing for un-coalesced loads C Zhao, F Wang, Z Lin, H Zhou, N Zheng 2016 IEEE 22nd International Conference on Parallel and Distributed Systems …, 2016 | 15 | 2016 |
Exploring memory persistency models for gpus Z Lin, M Alshboul, Y Solihin, H Zhou 2019 28th International Conference on Parallel Architectures and Compilation …, 2019 | 14 | 2019 |
Coordinated CTA combination and bandwidth partitioning for GPU concurrent kernel execution Z Lin, H Dai, M Mantor, H Zhou ACM Transactions on Architecture and Code Optimization (TACO) 16 (3), 1-27, 2019 | 14 | 2019 |
GPU performance vs. thread-level parallelism: Scalability analysis and a novel way to improve TLP Z Lin, M Mantor, H Zhou ACM Transactions on Architecture and Code Optimization (TACO) 15 (1), 1-21, 2018 | 11 | 2018 |
GLES: A practical GPGPU optimizing compiler using data sharing and thread coarsening Z Lin, X Gao, H Wan, B Jiang Languages and Compilers for Parallel Computing: 27th International Workshop …, 2015 | 7 | 2015 |
The demand for a sound baseline in gpu memory architecture research H Dai, C Li, Z Lin, H Zhou Proceedings of the Workshop on Duplicating, Deconstructing and Debunking (WDDD), 2017 | 4 | 2017 |
Poster: Accelerate GPU concurrent kernel execution by mitigating memory pipeline stalls H Dai, Z Lin, C Li, C Zhao, F Wang, N Zheng, H Zhou 2017 26th International Conference on Parallel Architectures and Compilation …, 2017 | 3 | 2017 |