The future of computing beyond Moore's Law

J Shalf - Philosophical Transactions of the Royal Society …, 2020 - royalsocietypublishing.org
Moore's Law is a techno-economic model that has enabled the information technology
industry to double the performance and functionality of digital electronics roughly every 2 …

A full-stack search technique for domain optimized deep learning accelerators

D Zhang, S Huda, E Songhori, K Prabhu, Q Le… - Proceedings of the 27th …, 2022 - dl.acm.org
The rapidly-changing deep learning landscape presents a unique opportunity for building
inference accelerators optimized for specific datacenter-scale workloads. We propose Full …

Towards general purpose acceleration by exploiting common data-dependence forms

V Dadu, J Weng, S Liu, T Nowatzki - … of the 52nd Annual IEEE/ACM …, 2019 - dl.acm.org
With slowing technology scaling, specialized accelerators are increasingly attractive
solutions to continue expected generational scaling of performance. However, in order to …

Tileflow: A framework for modeling fusion dataflow via tree-based analysis

S Zheng, S Chen, S Gao, L Jia, G Sun… - Proceedings of the 56th …, 2023 - dl.acm.org
With the increasing size of DNN models and the growing discrepancy between compute
performance and memory bandwidth, fusing multiple layers together to reduce off-chip …

Evaluating emerging ai/ml accelerators: Ipu, rdu, and nvidia/amd gpus

H Peng, C Ding, T Geng, S Choudhury… - Companion of the 15th …, 2024 - dl.acm.org
The relentless advancement of artificial intelligence (AI) and machine learning (ML)
applications necessitates the development of specialized hardware accelerators capable of …

FCNNLib: A flexible convolution algorithm library for deep learning on FPGAs

Y Liang, Q ** applications to dataflow-based coarse-grained reconfigurable array
AXM Chang, P Khopkar, B Romanous… - arxiv preprint arxiv …, 2022 - arxiv.org
The Streaming Engine (SE) is a Coarse-Grained Reconfigurable Array which provides
programming flexibility and high-performance with energy efficiency. An application program …

EA4RCA: Efficient AIE accelerator design framework for regular Communication-Avoiding Algorithm

W Zhang, Y Liu, T Zang, Z Bao - ACM Transactions on Architecture and …, 2024 - dl.acm.org
With the introduction of the Adaptive Intelligence Engine (AIE), the Versal Adaptive Compute
Acceleration Platform (Versal ACAP) has garnered great attention. However, the current …

Squaring the circle: Executing Sparse Matrix Computations on FlexTPU---A TPU-Like Processor

X He, KY Chen, S Feng, HS Kim, D Blaauw… - Proceedings of the …, 2022 - dl.acm.org
Systolic arrays have been successful to accelerate dense linear algebra for deep neural
networks (DNNs), but cannot handle sparse computations efficiently. Though early attempts …

DAP: A 507-GMACs/J 256-Core Domain Adaptive Processor for Wireless Communication and Linear Algebra Kernels in 12-nm FINFET

KY Chen, CS Yang, YH Sun, CW Tseng… - IEEE Journal of Solid …, 2024 - ieeexplore.ieee.org
We present domain adaptive processor (), a programmable systolic-array processor
designed for wireless communication and linear algebra workloads. uses a globally …