Mind the gap: Attainable data movement and operational intensity bounds for tensor algorithms

Q Huang, PA Tsai, JS Emer… - 2024 ACM/IEEE 51st …, 2024 - ieeexplore.ieee.org
The architectural design-space exploration (or DSE) process-whether manual or automated-
benefits greatly from knowing the limits of the metrics of interest in advance. Data movement …

Marvel: A data-centric approach for map** deep learning operators on spatial accelerators

P Chatarasi, H Kwon, A Parashar, M Pellauer… - ACM Transactions on …, 2021 - dl.acm.org
A spatial accelerator's efficiency depends heavily on both its mapper and cost models to
generate optimized map**s for various operators of DNN models. However, existing cost …

Demystifying map space exploration for NPUs

SC Kao, A Parashar, PA Tsai… - 2022 IEEE International …, 2022 - ieeexplore.ieee.org
Map Space Exploration is the problem of finding optimized map**s of a Deep Neural
Network (DNN) model on an accelerator. It is known to be extremely computationally …

ML processors are going multi-core: A performance dream or a scheduling nightmare?

M Verhelst, M Shi, L Mei - IEEE Solid-State Circuits Magazine, 2022 - ieeexplore.ieee.org
Applications of machine learning (ML) increasingly penetrate into our daily routines, our
work, and our living environments. In this way, more complex machine intelligence …

COAC: Cross-layer optimization of accelerator configurability for efficient CNN processing

S Colleman, M Shi, M Verhelst - IEEE Transactions on Very …, 2023 - ieeexplore.ieee.org
To achieve high accuracy, convolutional neural networks (CNNs) are increasingly growing
in complexity and diversity in layer types and topologies. This makes it very challenging to …

Multiobjective end-to-end design space exploration of parameterized dnn accelerators

E Russo, M Palesi, D Patti… - IEEE Internet of …, 2022 - ieeexplore.ieee.org
Deep neural network (DNN) hardware accelerators enable the execution of complex DNN
inferences on resource-constrained IoT devices. Inference performance and energy figures …

Building a domain-specific compiler for emerging processors with a reusable approach

M Li, Y Liu, B Chen, H Yang, Z Luan, D Qian - Science China Information …, 2024 - Springer
High-performance computing and deep learning domains have been motivating the design
of domain-specific processors. Although these processors can provide promising …

Sparsepipe: Sparse Inter-operator Dataflow Architecture with Cross-Iteration Reuse

Y Zhang, PA Tsai, HW Tseng - 2024 57th IEEE/ACM …, 2024 - ieeexplore.ieee.org
Sparse Tensor Algebra (STA) applications are limited by data movement and can benefit
from better data reuse. Prior research has focused on intra-operator data reuse, such as …

HiEval: A scheduling performance estimation approach for spatial accelerators via hierarchical abstraction

Z Wu, Y Hu, N Li, W Lu, Y Liu - Journal of Systems Architecture, 2024 - Elsevier
Workload scheduling strategy, referred to as map**, plays a vital role in exploring
hardware spatial accelerator performance. Evaluating all possible map**s experimentally …

LAHypergraph: Parallel Hypergraph Analytics in the Language of Linear Algebra

L Guo, J Firoz, G Kestor - SIAM Conference on Applied and Computational …, 2023 - SIAM
Hypergraphs are recently emerging as a robust set-theoretical mathematical tool that can
faithfully model higher-order relationships among the entities in a dataset. To design efficient …