Outerspace: An outer product based sparse matrix multiplication accelerator

S Pal, J Beaumont, DH Park… - … Symposium on High …, 2018 - ieeexplore.ieee.org
Sparse matrices are widely used in graph and data analytics, machine learning, engineering
and scientific applications. This paper describes and analyzes OuterSPACE, an accelerator …

Gamma: Leveraging Gustavson's algorithm to accelerate sparse matrix multiplication

G Zhang, N Attaluri, JS Emer, D Sanchez - Proceedings of the 26th ACM …, 2021 - dl.acm.org
Sparse matrix-sparse matrix multiplication (spMspM) is at the heart of a wide range of
scientific and machine learning applications. spMspM is inefficient on general-purpose …

Co-designing accelerators and SoC interfaces using gem5-Aladdin

YS Shao, SL **, V Srinivasan, GY Wei… - 2016 49th Annual …, 2016 - ieeexplore.ieee.org
Increasing demand for power-efficient, high-performance computing has spurred a growing
number and diversity of hardware accelerators in mobile and server Systems on Chip …

Buffets: An efficient and composable storage idiom for explicit decoupled data orchestration

M Pellauer, YS Shao, J Clemons, N Crago… - Proceedings of the …, 2019 - dl.acm.org
Accelerators spend significant area and effort on custom on-chip buffering. Unfortunately,
these solutions are strongly tied to particular designs, hampering re-usability across other …

Zorua: A holistic approach to resource virtualization in GPUs

N Vijaykumar, K Hsieh, G Pekhimenko… - 2016 49th Annual …, 2016 - ieeexplore.ieee.org
This paper introduces a new resource virtualization framework, Zorua, that decouples the
programmer-specified resource usage of a GPU application from the actual allocation in the …

Capstan: A vector RDA for sparsity

A Rucker, M Vilim, T Zhao, Y Zhang… - MICRO-54: 54th Annual …, 2021 - dl.acm.org
This paper proposes Capstan: a scalable, parallel-patterns-based, reconfigurable dataflow
accelerator (RDA) for sparse and dense tensor applications. Instead of designing for one …

Efficient GPU synchronization without scopes: Saying no to complex consistency models

MD Sinclair, J Alsop, SV Adve - … of the 48th International Symposium on …, 2015 - dl.acm.org
As GPUs have become increasingly general purpose, applications with more general
sharing patterns and fine-grained synchronization have started to emerge. Unfortunately …

SparseAdapt: Runtime control for sparse linear algebra on a reconfigurable accelerator

S Pal, A Amarnath, S Feng, M O'Boyle… - MICRO-54: 54th Annual …, 2021 - dl.acm.org
Dynamic adaptation is a post-silicon optimization technique that adapts the hardware to
workload phases. However, current adaptive approaches are oblivious to implicit phases …

Whirlpool: Improving dynamic cache management with static data classification

A Mukkara, N Beckmann, D Sanchez - ACM SIGARCH Computer …, 2016 - dl.acm.org
Cache hierarchies are increasingly non-uniform and difficult to manage. Several techniques,
such as scratchpads or reuse hints, use static information about how programs access data …

Morpheus: Extending the last level cache capacity in GPU systems using idle GPU core resources

S Darabi, M Sadrosadati, N Akbarzadeh… - 2022 55th IEEE/ACM …, 2022 - ieeexplore.ieee.org
Graphics Processing Units (GPUs) are widely-used accelerators for data-parallel
applications. In many GPU applications, GPU memory bandwidth bottlenecks performance …