Outerspace: An outer product based sparse matrix multiplication accelerator

S Pal, J Beaumont, DH Park… - … Symposium on High …, 2018 - ieeexplore.ieee.org
Sparse matrices are widely used in graph and data analytics, machine learning, engineering
and scientific applications. This paper describes and analyzes OuterSPACE, an accelerator …

Gamma: Leveraging Gustavson's algorithm to accelerate sparse matrix multiplication

G Zhang, N Attaluri, JS Emer, D Sanchez - Proceedings of the 26th ACM …, 2021 - dl.acm.org
Sparse matrix-sparse matrix multiplication (spMspM) is at the heart of a wide range of
scientific and machine learning applications. spMspM is inefficient on general-purpose …

{PANIC}: A {High-Performance} programmable {NIC} for multi-tenant networks

J Lin, K Patel, BE Stephens, A Sivaraman… - … USENIX Symposium on …, 2020 - usenix.org
Programmable NICs have diverse uses, and there is need for a NIC platform that can offload
computation from multiple co-resident applications to many different types of substrates …

Sparse-TPU: Adapting systolic arrays for sparse matrices

X He, S Pal, A Amarnath, S Feng, DH Park… - Proceedings of the 34th …, 2020 - dl.acm.org
While systolic arrays are widely used for dense-matrix operations, they are seldom used for
sparse-matrix operations. In this paper, we show how a systolic array of Multiply-and …

Spada: Accelerating sparse matrix multiplication with adaptive dataflow

Z Li, J Li, T Chen, D Niu, H Zheng, Y **e… - Proceedings of the 28th …, 2023 - dl.acm.org
Sparse matrix-matrix multiplication (SpGEMM) is widely used in many scientific and deep
learning applications. The highly irregular structures of SpGEMM limit its performance and …

Slim noc: A low-diameter on-chip network topology for high energy efficiency and scalability

M Besta, SM Hassan, S Yalamanchili… - ACM SIGPLAN …, 2018 - dl.acm.org
Emerging chips with hundreds and thousands of cores require networks with unprecedented
energy/area efficiency and scalability. To address this, we propose Slim NoC (SN): a new on …

SparseAdapt: Runtime control for sparse linear algebra on a reconfigurable accelerator

S Pal, A Amarnath, S Feng, M O'Boyle… - MICRO-54: 54th Annual …, 2021 - dl.acm.org
Dynamic adaptation is a post-silicon optimization technique that adapts the hardware to
workload phases. However, current adaptive approaches are oblivious to implicit phases …

Scalability of broadcast performance in wireless network-on-chip

S Abadal, A Mestres, M Nemirovsky… - … on Parallel and …, 2016 - ieeexplore.ieee.org
Networks-on-Chip (NoCs) are currently the paradigm of choice to interconnect the cores of a
chip multiprocessor. However, conventional NoCs may not suffice to fulfill the on-chip …

Transmuter: Bridging the efficiency gap using memory and dataflow reconfiguration

S Pal, S Feng, D Park, S Kim, A Amarnath… - Proceedings of the …, 2020 - dl.acm.org
With the end of Dennard scaling and Moore's law, it is becoming increasingly difficult to build
hardware for emerging applications that meet power and performance targets, while …

Cooperative computing techniques for a deeply fused and heterogeneous many-core processor architecture

F Zheng, HL Li, H Lv, F Guo, XH Xu, XH **e - Journal of Computer Science …, 2015 - Springer
Due to advances in semiconductor techniques, many-core processors have been widely
used in high performance computing. However, many applications still cannot be carried out …