Outerspace: An outer product based sparse matrix multiplication accelerator
Sparse matrices are widely used in graph and data analytics, machine learning, engineering
and scientific applications. This paper describes and analyzes OuterSPACE, an accelerator …
and scientific applications. This paper describes and analyzes OuterSPACE, an accelerator …
Gamma: Leveraging Gustavson's algorithm to accelerate sparse matrix multiplication
Sparse matrix-sparse matrix multiplication (spMspM) is at the heart of a wide range of
scientific and machine learning applications. spMspM is inefficient on general-purpose …
scientific and machine learning applications. spMspM is inefficient on general-purpose …
{PANIC}: A {High-Performance} programmable {NIC} for multi-tenant networks
Programmable NICs have diverse uses, and there is need for a NIC platform that can offload
computation from multiple co-resident applications to many different types of substrates …
computation from multiple co-resident applications to many different types of substrates …
Sparse-TPU: Adapting systolic arrays for sparse matrices
While systolic arrays are widely used for dense-matrix operations, they are seldom used for
sparse-matrix operations. In this paper, we show how a systolic array of Multiply-and …
sparse-matrix operations. In this paper, we show how a systolic array of Multiply-and …
Spada: Accelerating sparse matrix multiplication with adaptive dataflow
Sparse matrix-matrix multiplication (SpGEMM) is widely used in many scientific and deep
learning applications. The highly irregular structures of SpGEMM limit its performance and …
learning applications. The highly irregular structures of SpGEMM limit its performance and …
Slim noc: A low-diameter on-chip network topology for high energy efficiency and scalability
Emerging chips with hundreds and thousands of cores require networks with unprecedented
energy/area efficiency and scalability. To address this, we propose Slim NoC (SN): a new on …
energy/area efficiency and scalability. To address this, we propose Slim NoC (SN): a new on …
SparseAdapt: Runtime control for sparse linear algebra on a reconfigurable accelerator
Dynamic adaptation is a post-silicon optimization technique that adapts the hardware to
workload phases. However, current adaptive approaches are oblivious to implicit phases …
workload phases. However, current adaptive approaches are oblivious to implicit phases …
Scalability of broadcast performance in wireless network-on-chip
Networks-on-Chip (NoCs) are currently the paradigm of choice to interconnect the cores of a
chip multiprocessor. However, conventional NoCs may not suffice to fulfill the on-chip …
chip multiprocessor. However, conventional NoCs may not suffice to fulfill the on-chip …
Transmuter: Bridging the efficiency gap using memory and dataflow reconfiguration
With the end of Dennard scaling and Moore's law, it is becoming increasingly difficult to build
hardware for emerging applications that meet power and performance targets, while …
hardware for emerging applications that meet power and performance targets, while …
Cooperative computing techniques for a deeply fused and heterogeneous many-core processor architecture
F Zheng, HL Li, H Lv, F Guo, XH Xu, XH **e - Journal of Computer Science …, 2015 - Springer
Due to advances in semiconductor techniques, many-core processors have been widely
used in high performance computing. However, many applications still cannot be carried out …
used in high performance computing. However, many applications still cannot be carried out …