DBAR: an efficient routing algorithm to support multiple concurrent applications in networks-on-chip S Ma, N Enright Jerger, Z Wang Proceedings of the 38th annual international symposium on Computer …, 2011 | 227 | 2011 |
Whole packet forwarding: Efficient design of fully adaptive routing algorithms for networks-on-chip S Ma, NE Jerger, Z Wang IEEE International Symposium on High-Performance Comp Architecture, 1-12, 2012 | 112 | 2012 |
Low-cost binary128 floating-point FMA unit design with SIMD support L Huang, S Ma, L Shen, Z Wang, N Xiao IEEE Transactions on Computers 61 (5), 745-751, 2011 | 54 | 2011 |
Supporting efficient collective communication in NoCs S Ma, NE Jerger, Z Wang IEEE International Symposium on High-Performance Comp Architecture, 1-12, 2012 | 50 | 2012 |
Leaving one slot empty: Flit bubble flow control for torus cache-coherent NoCs S Ma, Z Wang, Z Liu, NE Jerger IEEE Transactions on Computers 64 (3), 763-777, 2013 | 48 | 2013 |
A Survey of Design and Optimization for Systolic Array-based DNN Accelerators R Xu, S Ma, Y Guo, D Li ACM Computing Surveys 56 (1), 1-37, 2023 | 41 | 2023 |
A high performance reliable NoC router L Wang, S Ma, C Li, W Chen, Z Wang Integration 58, 583-592, 2017 | 38 | 2017 |
Novel flow control for fully adaptive routing in cache-coherent NoCs S Ma, Z Wang, NE Jerger, L Shen, N Xiao IEEE Transactions on Parallel and Distributed Systems 25 (9), 2397-2407, 2013 | 38 | 2013 |
Configurable multi-directional systolic array architecture for convolutional neural networks R Xu, S Ma, Y Wang, X Chen, Y Guo ACM Transactions on Architecture and Code Optimization (TACO) 18 (4), 1-24, 2021 | 33 | 2021 |
Heterogeneous systolic array architecture for compact cnns hardware accelerators R Xu, S Ma, Y Wang, Y Guo, D Li, Y Qiao IEEE Transactions on Parallel and Distributed Systems 33 (11), 2860-2871, 2021 | 28 | 2021 |
Networks-on-chip: from implementations to programming paradigms S Ma, L Huang, M Lai, W Shi Morgan Kaufmann, 2014 | 28 | 2014 |
SIF: Overcoming the limitations of SIMD devices via implicit permutation L Huang, L Shen, Z Wang, W Shi, N Xiao, S Ma HPCA-16 2010 The Sixteenth International Symposium on High-Performance …, 2010 | 28 | 2010 |
A low-cost conflict-free NoC for GPGPUs X Zhao, S Ma, Y Liu, L Eeckhout, Z Wang Proceedings of the 53rd Annual Design Automation Conference, 1-6, 2016 | 23 | 2016 |
A heterogeneous low-cost and low-latency ring-chain network for GPGPUs X Zhao, S Ma, C Li, L Eeckhout, Z Wang 2016 IEEE 34th International Conference on Computer Design (ICCD), 472-479, 2016 | 20 | 2016 |
CMSA: Configurable multi-directional systolic array for convolutional neural networks R Xu, S Ma, Y Wang, Y Guo 2020 IEEE 38th International Conference on Computer Design (ICCD), 494-497, 2020 | 19 | 2020 |
Priority-based PCIe scheduling for multi-tenant multi-GPU systems C Li, Y Sun, L Jin, L Xu, Z Cao, P Fan, D Kaeli, S Ma, Y Guo, J Yang IEEE Computer Architecture Letters 18 (2), 157-160, 2019 | 17 | 2019 |
RHS-TRNG: A resilient high-speed true random number generator based on STT-MTJ device S Fu, T Li, C Zhang, H Li, S Ma, J Zhang, R Zhang, L Wu IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 2023 | 16 | 2023 |
HeSA: Heterogeneous systolic array architecture for compact CNNs hardware accelerators R Xu, S Ma, Y Wang, Y Guo 2021 Design, Automation & Test in Europe Conference & Exhibition (DATE), 657-662, 2021 | 16 | 2021 |
A comprehensive comparison between virtual cut-through and wormhole routers for cache coherent Network on-Chips P Wang, S Ma, H Lu, Z Wang IEICE Electronics Express 11 (14), 20140496-20140496, 2014 | 14 | 2014 |
Coordinated DMA: improving the DRAM access efficiency for matrix multiplication S Ma, Z Liu, S Chen, L Huang, Y Guo, Z Wang, M Zhang IEEE Transactions on Parallel and Distributed Systems 30 (10), 2148-2164, 2019 | 13 | 2019 |