Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding S Han, H Mao, WJ Dally arXiv preprint arXiv:1510.00149, 2015 | 11294 | 2015 |
SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 0.5 MB model size FN Iandola, S Han, MW Moskewicz, K Ashraf, WJ Dally, K Keutzer arXiv preprint arXiv:1602.07360, 2016 | 10786 | 2016 |
Learning both weights and connections for efficient neural network S Han, J Pool, J Tran, W Dally Advances in neural information processing systems 28, 2015 | 8537 | 2015 |
Principles and practices of interconnection networks WJ Dally, BP Towles Elsevier, 2004 | 4937 | 2004 |
Route packets, not wires: on-chip inteconnection networks WJ Dally, B Towles Proceedings of the 38th annual design automation conference, 684-689, 2001 | 4840 | 2001 |
EIE: Efficient inference engine on compressed deep neural network S Han, X Liu, H Mao, J Pu, A Pedram, MA Horowitz, WJ Dally ACM SIGARCH Computer Architecture News 44 (3), 243-254, 2016 | 3356 | 2016 |
Deadlock-free message routing in multiprocessor interconnection networks Dally, Seitz IEEE Transactions on computers 100 (5), 547-553, 1987 | 3091 | 1987 |
Virtual-channel flow control WJ Dally IEEE Transactions on Parallel and Distributed systems 3 (2), 194-205, 1992 | 1985 | 1992 |
Deep gradient compression: Reducing the communication bandwidth for distributed training Y Lin, S Han, H Mao, Y Wang, WJ Dally arXiv preprint arXiv:1712.01887, 2017 | 1704 | 2017 |
SCNN: An accelerator for compressed-sparse convolutional neural networks A Parashar, M Rhu, A Mukkara, A Puglielli, R Venkatesan, B Khailany, ... ACM SIGARCH computer architecture news 45 (2), 27-40, 2017 | 1521 | 2017 |
Performance analysis of k-ary n-cube interconnection networks WJ Dally IEEE transactions on Computers 39 (6), 775-785, 1990 | 1486 | 1990 |
The GPU computing era J Nickolls, WJ Dally IEEE micro 30 (2), 56-69, 2010 | 1409 | 2010 |
Digital systems engineering WJ Dally, JW Poulton Cambridge university press, 1998 | 1371 | 1998 |
The torus routing chip WJ Dally, CL Seitz Distributed computing 1, 187-196, 1986 | 1370* | 1986 |
Trained ternary quantization C Zhu, S Han, H Mao, WJ Dally arXiv preprint arXiv:1612.01064, 2016 | 1366 | 2016 |
Memory access scheduling S Rixner, WJ Dally, UJ Kapasi, P Mattson, JD Owens ACM SIGARCH Computer Architecture News 28 (2), 128-138, 2000 | 1361 | 2000 |
A detailed and flexible cycle-accurate network-on-chip simulator N Jiang, DU Becker, G Michelogiannakis, J Balfour, B Towles, DE Shaw, ... 2013 IEEE international symposium on performance analysis of systems and …, 2013 | 920 | 2013 |
Ese: Efficient speech recognition engine with sparse lstm on fpga S Han, J Kang, H Mao, Y Hu, X Li, Y Li, D Xie, H Luo, S Yao, Y Wang, ... Proceedings of the 2017 ACM/SIGDA international symposium on field …, 2017 | 864 | 2017 |
GPUs and the future of parallel computing SW Keckler, WJ Dally, B Khailany, M Garland, D Glasco IEEE micro 31 (5), 7-17, 2011 | 858 | 2011 |
Deadlock-free adaptive routing in multicomputer networks using virtual channels WJ Dally, H Aoki IEEE transactions on Parallel and Distributed Systems 4 (4), 466-475, 1993 | 844 | 1993 |