Towards general purpose acceleration by exploiting common data-dependence forms

V Dadu, J Weng, S Liu, T Nowatzki - … of the 52nd Annual IEEE/ACM …, 2019 - dl.acm.org
With slowing technology scaling, specialized accelerators are increasingly attractive
solutions to continue expected generational scaling of performance. However, in order to …

A hybrid systolic-dataflow architecture for inductive matrix algorithms

J Weng, S Liu, Z Wang, V Dadu… - 2020 IEEE International …, 2020 - ieeexplore.ieee.org
Dense linear algebra kernels are critical for wireless, and the oncoming proliferation of 5G
only amplifies their importance. Due to the inductive nature of many such algorithms …

Täkō: A polymorphic cache hierarchy for general-purpose optimization of data movement

BC Schwedock, P Yoovidhya, J Seibert… - Proceedings of the 49th …, 2022 - dl.acm.org
Current systems hide data movement from software behind the load-store interface.
Software's inability to observe and respond to data movement is the root cause of many …

A reschedulable dataflow-simd execution for increased utilization in cgra cross-domain acceleration

C Yin, N **g, J Jiang, Q Wang… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
When a coarse-grained reconfigurable array (CGRA) architecture shifts toward cross-
domain acceleration, control flow and memory accesses often degrade the processing …

Catena: A near-threshold, sub-0.4-mW, 16-core programmable spatial array accelerator for the ultralow-power mobile and embedded Internet of Things

JP Cerqueira, TJ Repetti, Y Pu… - IEEE Journal of Solid …, 2020 - ieeexplore.ieee.org
In this article, we present Catena, a near-threshold voltage 16-core programmable spatial
array accelerator supporting workloads for ultralow-power (ULP) mobile and embedded …

Master of none acceleration: A comparison of accelerator architectures for analytical query processing

A Lottarini, JP Cerqueira, TJ Repetti… - Proceedings of the 46th …, 2019 - dl.acm.org
Hardware accelerators are one promising solution to contend with the end of Dennard
scaling and the slowdown of Moore's law. For mature workloads that are regular and have …

An elastic task scheduling scheme on coarse-grained reconfigurable architectures

L Chen, J Zhu, Y Deng, Z Li, J Chen… - … on Parallel and …, 2021 - ieeexplore.ieee.org
Coarse-grained reconfigurable architectures (CGRAs) are increasingly employed as domain-
specific accelerators due to their efficiency and flexibility. A CGRA typically relies on …

Subgraph decoupling and rescheduling for increased utilization in CGRA architecture

C Yin, Q Wang, J Jiang, W Sheng, G He… - … , Automation & Test …, 2021 - ieeexplore.ieee.org
When coarse-grained reconfigurable array (CGRA) architecture is shifting towards general-
purpose, some complex control flows, such as nested loop, conditional branch and data …

Leviathan: A Unified System for General-Purpose Near-Data Computing

BC Schwedock, N Beckmann - 2024 57th IEEE/ACM …, 2024 - ieeexplore.ieee.org
The rising cost of data movement poses a significant challenge to future computing systems.
The call to arms for novel data-centric systems has spawned a wave of near-data computing …

GEM: Ultra-Efficient Near-Memory Reconfigurable Acceleration for Read Map** by Dividing and Predictive Scattering

L Chen, J Zhu, G Peng, M Liu, S Wei… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Read map**, which maps billions of reads to a reference DNA, poses a significant
performance bottleneck in genomic analysis. Current accelerators for read map** are …