A configurable cloud-scale DNN processor for real-time AI

J Fowers, K Ovtcharov, M Papamichael… - 2018 ACM/IEEE 45th …, 2018 - ieeexplore.ieee.org
Interactive AI-powered services require low-latency evaluation of deep neural network
(DNN) models-aka"" real-time AI"". The growing demand for computationally expensive …

In-datacenter performance analysis of a tensor processing unit

NP Jouppi, C Young, N Patil, D Patterson… - Proceedings of the 44th …, 2017 - dl.acm.org
Many architects believe that major improvements in cost-energy-performance must now
come from domain-specific hardware. This paper evaluates a custom ASIC---called a Tensor …

Sparse tensor core: Algorithm and hardware co-design for vector-wise sparse neural networks on modern gpus

M Zhu, T Zhang, Z Gu, Y ** on cgra via hierarchical abstraction
D Wijerathne, Z Li, A Pathania, T Mitra… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org
Coarse-grained reconfigurable array (CGRA) has emerged as a promising hardware
accelerator due to the excellent balance between reconfigurability, performance, and energy …