Gpipe: Efficient training of giant neural networks using pipeline parallelism Y Huang, Y Cheng, A Bapna, O Firat, D Chen, M Chen, HJ Lee, J Ngiam, ... Advances in neural information processing systems 32, 2019 | 1863 | 2019 |
Lamda: Language models for dialog applications R Thoppilan, D De Freitas, J Hall, N Shazeer, A Kulshreshtha, HT Cheng, ... arXiv preprint arXiv:2201.08239, 2022 | 1664 | 2022 |
Gshard: Scaling giant models with conditional computation and automatic sharding D Lepikhin, HJ Lee, Y Xu, D Chen, O Firat, Y Huang, M Krikun, N Shazeer, ... arXiv preprint arXiv:2006.16668, 2020 | 1098 | 2020 |
Mlperf training benchmark P Mattson, C Cheng, G Diamos, C Coleman, P Micikevicius, D Patterson, ... Proceedings of Machine Learning and Systems 2, 336-349, 2020 | 360 | 2020 |
MapCG: Writing parallel program portable between CPU and GPU C Hong, D Chen, W Chen, W Zheng, H Lin Proceedings of the 19th international conference on Parallel architectures …, 2010 | 224 | 2010 |
Lingvo: a modular and scalable framework for sequence-to-sequence modeling J Shen, P Nguyen, Y Wu, Z Chen, MX Chen, Y Jia, A Kannan, T Sainath, ... arXiv preprint arXiv:1902.08295, 2019 | 212 | 2019 |
Image classification at supercomputer scale C Ying, S Kumar, D Chen, T Wang, Y Cheng arXiv preprint arXiv:1811.06992, 2018 | 162 | 2018 |
AutoFDO: Automatic feedback-directed optimization for warehouse-scale applications D Chen, DX Li, T Moseley Proceedings of the 2016 International Symposium on Code Generation and …, 2016 | 133 | 2016 |
GSPMD: general and scalable parallelization for ML computation graphs Y Xu, HJ Lee, D Chen, B Hechtman, Y Huang, R Joshi, M Krikun, ... arXiv preprint arXiv:2105.04663, 2021 | 130 | 2021 |
Renelito Delos Santos R Thoppilan, D De Freitas, J Hall, N Shazeer, A Kulshreshtha, HT Cheng, ... | 108 | 2022 |
Taming hardware event samples for fdo compilation D Chen, N Vachharajani, R Hundt, S Liao, V Ramasamy, P Yuan, W Chen, ... Proceedings of the 8th annual IEEE/ACM international symposium on Code …, 2010 | 91 | 2010 |
Overlap communication with dependent computation via decomposition in large deep learning models S Wang, J Wei, A Sabne, A Davis, B Ilbeyi, B Hechtman, D Chen, ... Proceedings of the 28th ACM International Conference on Architectural …, 2022 | 64 | 2022 |
Tree partition based parallel frequent pattern mining on shared memory systems D Chen, C Lai, W Hu, WG Chen, Y Zhang, W Zheng Proceedings 20th IEEE International Parallel & Distributed Processing …, 2006 | 54 | 2006 |
Taming hardware event samples for precise and versatile feedback directed optimizations D Chen, N Vachharajani, R Hundt, X Li, S Eranian, W Chen, W Zheng IEEE Transactions on Computers 62 (2), 376-389, 2011 | 50 | 2011 |
Scale mlperf-0.6 models on google tpu-v3 pods S Kumar, V Bitorff, D Chen, C Chou, B Hechtman, HJ Lee, N Kumar, ... arXiv preprint arXiv:1909.09756, 2019 | 42 | 2019 |
Lamda: Language models for dialog applications AD Cohen, A Roberts, A Molina, A Butryna, A Jin, A Kulshreshtha, ... arXiv preprint arXiv:2201.08239, 2022 | 35 | 2022 |
Automatic cross-replica sharding of weight update in data-parallel training Y Xu, HJ Lee, D Chen, H Choi, B Hechtman, S Wang arXiv preprint arXiv:2004.13336, 2020 | 34 | 2020 |
Exploring the limits of Concurrency in ML Training on Google TPUs S Kumar, Y Wang, C Young, J Bradbury, N Kumar, D Chen, A Swing Proceedings of Machine Learning and Systems 3, 81-92, 2021 | 22 | 2021 |
Feedback-directed optimizations in gcc with estimated edge profiles from hardware event sampling V Ramasamy, P Yuan, D Chen, R Hundt Proceedings of GCC Summit, 87-102, 2008 | 22 | 2008 |
Compile-time feedback-directed optimizations using estimated edge profiles from hardware-event sampling R Hundt, V Ramasamy, D Chen US Patent 8,387,026, 2013 | 20 | 2013 |