Towards general purpose acceleration by exploiting common data-dependence forms
With slowing technology scaling, specialized accelerators are increasingly attractive
solutions to continue expected generational scaling of performance. However, in order to …
solutions to continue expected generational scaling of performance. However, in order to …
A hybrid systolic-dataflow architecture for inductive matrix algorithms
Dense linear algebra kernels are critical for wireless, and the oncoming proliferation of 5G
only amplifies their importance. Due to the inductive nature of many such algorithms …
only amplifies their importance. Due to the inductive nature of many such algorithms …
Täkō: A polymorphic cache hierarchy for general-purpose optimization of data movement
BC Schwedock, P Yoovidhya, J Seibert… - Proceedings of the 49th …, 2022 - dl.acm.org
Current systems hide data movement from software behind the load-store interface.
Software's inability to observe and respond to data movement is the root cause of many …
Software's inability to observe and respond to data movement is the root cause of many …
A reschedulable dataflow-simd execution for increased utilization in cgra cross-domain acceleration
C Yin, N **g, J Jiang, Q Wang… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
When a coarse-grained reconfigurable array (CGRA) architecture shifts toward cross-
domain acceleration, control flow and memory accesses often degrade the processing …
domain acceleration, control flow and memory accesses often degrade the processing …
Catena: A near-threshold, sub-0.4-mW, 16-core programmable spatial array accelerator for the ultralow-power mobile and embedded Internet of Things
In this article, we present Catena, a near-threshold voltage 16-core programmable spatial
array accelerator supporting workloads for ultralow-power (ULP) mobile and embedded …
array accelerator supporting workloads for ultralow-power (ULP) mobile and embedded …
Master of none acceleration: A comparison of accelerator architectures for analytical query processing
Hardware accelerators are one promising solution to contend with the end of Dennard
scaling and the slowdown of Moore's law. For mature workloads that are regular and have …
scaling and the slowdown of Moore's law. For mature workloads that are regular and have …
An elastic task scheduling scheme on coarse-grained reconfigurable architectures
Coarse-grained reconfigurable architectures (CGRAs) are increasingly employed as domain-
specific accelerators due to their efficiency and flexibility. A CGRA typically relies on …
specific accelerators due to their efficiency and flexibility. A CGRA typically relies on …
Subgraph decoupling and rescheduling for increased utilization in CGRA architecture
C Yin, Q Wang, J Jiang, W Sheng, G He… - … , Automation & Test …, 2021 - ieeexplore.ieee.org
When coarse-grained reconfigurable array (CGRA) architecture is shifting towards general-
purpose, some complex control flows, such as nested loop, conditional branch and data …
purpose, some complex control flows, such as nested loop, conditional branch and data …
Leviathan: A Unified System for General-Purpose Near-Data Computing
The rising cost of data movement poses a significant challenge to future computing systems.
The call to arms for novel data-centric systems has spawned a wave of near-data computing …
The call to arms for novel data-centric systems has spawned a wave of near-data computing …
GEM: Ultra-Efficient Near-Memory Reconfigurable Acceleration for Read Map** by Dividing and Predictive Scattering
Read map**, which maps billions of reads to a reference DNA, poses a significant
performance bottleneck in genomic analysis. Current accelerators for read map** are …
performance bottleneck in genomic analysis. Current accelerators for read map** are …