Hierarchical Roofline Analysis for GPUs: Accelerating Performance Optimization for the NERSC‐9 Perlmutter System C Yang, T Kurth, S Williams Concurrency and Computation: Practice and Experience 32 (20), e5547, 2020 | 73 | 2020 |
An Empirical Roofline Methodology for Quantitatively Assessing Performance Portability C Yang, R Gayatri, T Kurth, P Basu, Z Ronaghi, A Adetokunbo, B Friesen, ... 2018 IEEE/ACM International Workshop on Performance, Portability and …, 2018 | 51 | 2018 |
A Case Study for Performance Portability using OpenMP 4.5 R Gayatri, C Yang, T Kurth, J Deslippe Fifth Workshop on Accelerator Programming Using Directives (WACCPD) 2018, 2018 | 45 | 2018 |
Accelerating Large-Scale Excited-State GW Calculations on Leadership HPC Systems M Del Ben, C Yang, Z Li, H Felipe, S Louie, J Deslippe SC20: International Conference for High Performance Computing, Networking …, 2020 | 35 | 2020 |
Timemory: modular performance analysis for HPC JR Madsen, MG Awan, H Brunie, J Deslippe, R Gayatri, L Oliker, Y Wang, ... High Performance Computing: 35th International Conference, ISC High …, 2020 | 34 | 2020 |
Hierarchical roofline analysis: How to collect data using performance tools on intel cpus and nvidia gpus C Yang arXiv preprint arXiv:2009.02449, 2020 | 28 | 2020 |
A Novel Multi-Level Integrated Roofline Model Approach for Performance Characterization T Koskela, Z Matveev, C Yang, A Adedoyin, R Belenov, P Thierry, Z Zhao, ... International Conference on High Performance Computing, 226-245, 2018 | 27 | 2018 |
Hierarchical Roofline Performance Analysis for Deep Learning Applications C Yang, Y Wang, S Farrell, T Kurth, S Williams arXiv preprint arXiv:2009.05257, 2020 | 26 | 2020 |
Time-Based Roofline for Deep Learning Performance Analysis Y Wang, C Yang, S Farrell, Y Zhang, T Kurth, S Williams arXiv preprint arXiv:2009.04598, 2020 | 19 | 2020 |
Outcomes of OpenMP hackathon: OpenMP application experiences with the offloading model (Part II) B Chapman, B Pham, C Yang, C Daley, C Bertoni, D Kulkarni, ... OpenMP: Enabling Massive Node-Level Parallelism: 17th International Workshop …, 2021 | 14 | 2021 |
8 steps to 3.7 tflop/s on nvidia v100 gpu: Roofline analysis and other tricks C Yang arXiv preprint arXiv:2008.11326, 2020 | 11 | 2020 |
Accelerate Science on Perlmutter with NERSC C Yang, J Deslippe Bulletin of the American Physical Society 65 (Peer-Reviewed Talk), 2020 | 10 | 2020 |
A Factor Graph Approach to Exploiting Cyclic Prefix for Equalization in OFDM Systems CJ Yang, Q Guo, DD Huang, S Nordholm IEEE transactions on communications 61 (12), 4972-4983, 2013 | 10 | 2013 |
Toward Automated Application Profiling on Cray Systems C Yang, B Friesen, T Kurth, B Cook, S Williams Cray User Group (CUG), 2018 | 9 | 2018 |
Outcomes of OpenMP hackathon: OpenMP application experiences with the offloading mode S Pophale, D Oryspayev, B Chapman, B Pham, C Yang, C Daley, ... Brookhaven National Lab.(BNL), Upton, NY (United States), 2021 | 8 | 2021 |
An extended roofline performance model with pci-e and network ceilings AS Dufek, JR Deslippe, PT Lin, CJ Yang, BG Cook, J Madsen 2021 International Workshop on Performance Modeling, Benchmarking and …, 2021 | 6 | 2021 |
Rahulkumar Gayatri C Yang Thorsten Kurth, Protonu Basu, Zahra Ronaghi, Adedoyin Adetokunbo, Brian …, 2018 | 6 | 2018 |
A Metric for Evaluating Supercomputer Performance in the Era of Extreme Heterogeneity B Austin, C Daley, D Doerfler, J Deslippe, B Cook, B Friesen, T Kurth, ... | 6 | 2018 |
Empirical Roofline Toolkit C Yang URL: https://bitbucket. org/berkeleylab/cs-roofline-toolkit 2, 2021 | 5 | 2021 |
SC’20: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, Atlanta, GA, USA, 9–19 November 2020 MD Ben, C Yang, Z Li, FH da Jornada, SG Louie, J Deslippe IEEE Press: New York, NY, USA, 2020 | 5 | 2020 |