Mind the gap: Attainable data movement and operational intensity bounds for tensor algorithms
The architectural design-space exploration (or DSE) process-whether manual or automated-
benefits greatly from knowing the limits of the metrics of interest in advance. Data movement …
benefits greatly from knowing the limits of the metrics of interest in advance. Data movement …
Marvel: A data-centric approach for map** deep learning operators on spatial accelerators
A spatial accelerator's efficiency depends heavily on both its mapper and cost models to
generate optimized map**s for various operators of DNN models. However, existing cost …
generate optimized map**s for various operators of DNN models. However, existing cost …
Demystifying map space exploration for NPUs
Map Space Exploration is the problem of finding optimized map**s of a Deep Neural
Network (DNN) model on an accelerator. It is known to be extremely computationally …
Network (DNN) model on an accelerator. It is known to be extremely computationally …
ML processors are going multi-core: A performance dream or a scheduling nightmare?
Applications of machine learning (ML) increasingly penetrate into our daily routines, our
work, and our living environments. In this way, more complex machine intelligence …
work, and our living environments. In this way, more complex machine intelligence …
COAC: Cross-layer optimization of accelerator configurability for efficient CNN processing
To achieve high accuracy, convolutional neural networks (CNNs) are increasingly growing
in complexity and diversity in layer types and topologies. This makes it very challenging to …
in complexity and diversity in layer types and topologies. This makes it very challenging to …
Multiobjective end-to-end design space exploration of parameterized dnn accelerators
Deep neural network (DNN) hardware accelerators enable the execution of complex DNN
inferences on resource-constrained IoT devices. Inference performance and energy figures …
inferences on resource-constrained IoT devices. Inference performance and energy figures …
Building a domain-specific compiler for emerging processors with a reusable approach
High-performance computing and deep learning domains have been motivating the design
of domain-specific processors. Although these processors can provide promising …
of domain-specific processors. Although these processors can provide promising …
Sparsepipe: Sparse Inter-operator Dataflow Architecture with Cross-Iteration Reuse
Sparse Tensor Algebra (STA) applications are limited by data movement and can benefit
from better data reuse. Prior research has focused on intra-operator data reuse, such as …
from better data reuse. Prior research has focused on intra-operator data reuse, such as …
HiEval: A scheduling performance estimation approach for spatial accelerators via hierarchical abstraction
Z Wu, Y Hu, N Li, W Lu, Y Liu - Journal of Systems Architecture, 2024 - Elsevier
Workload scheduling strategy, referred to as map**, plays a vital role in exploring
hardware spatial accelerator performance. Evaluating all possible map**s experimentally …
hardware spatial accelerator performance. Evaluating all possible map**s experimentally …
LAHypergraph: Parallel Hypergraph Analytics in the Language of Linear Algebra
Hypergraphs are recently emerging as a robust set-theoretical mathematical tool that can
faithfully model higher-order relationships among the entities in a dataset. To design efficient …
faithfully model higher-order relationships among the entities in a dataset. To design efficient …