Band: coordinated multi-dnn inference on heterogeneous mobile processors

JS Jeong, J Lee, D Kim, C Jeon, C Jeong… - Proceedings of the 20th …, 2022 - dl.acm.org
The rapid development of deep learning algorithms, as well as innovative hardware
advancements, encourages multi-DNN workloads such as augmented reality applications …

[HTML][HTML] Recent developments in low-power AI accelerators: A survey

C Åleskog, H Grahn, A Borg - Algorithms, 2022 - mdpi.com
As machine learning and AI continue to rapidly develop, and with the ever-closer end of
Moore's law, new avenues and novel ideas in architecture design are being created and …

Sparseloop: An analytical approach to sparse tensor accelerator modeling

YN Wu, PA Tsai, A Parashar, V Sze… - 2022 55th IEEE/ACM …, 2022 - ieeexplore.ieee.org
In recent years, many accelerators have been proposed to efficiently process sparse tensor
algebra applications (eg, sparse neural networks). However, these proposals are single …

Adaptable butterfly accelerator for attention-based NNs via hardware and algorithm co-design

H Fan, T Chau, SI Venieris, R Lee… - 2022 55th IEEE/ACM …, 2022 - ieeexplore.ieee.org
Attention-based neural networks have become pervasive in many AI tasks. Despite their
excellent algorithmic performance, the use of the attention mechanism and feedforward …

Highlight: Efficient and flexible dnn acceleration with hierarchical structured sparsity

YN Wu, PA Tsai, S Muralidharan, A Parashar… - Proceedings of the 56th …, 2023 - dl.acm.org
Due to complex interactions among various deep neural network (DNN) optimization
techniques, modern DNNs can have weights and activations that are dense or sparse with …

Reconfigurability, why it matters in AI tasks processing: A survey of reconfigurable AI chips

S Wei, X Lin, F Tu, Y Wang, L Liu… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
Nowadays, artificial intelligence (AI) technologies, especially deep neural networks (DNNs),
play an vital role in solving many problems in both academia and industry. In order to …

A multi-mode 8k-MAC HW-utilization-aware neural processing unit with a unified multi-precision datapath in 4-nm flagship mobile SoC

JS Park, C Park, S Kwon, T Jeon… - IEEE Journal of Solid …, 2022 - ieeexplore.ieee.org
This article presents an 8k-multiply-accumulate (MAC) neural processing unit (NPU) in 4-nm
mobile system-on-chip (SoC). The unified multi-precision MACs support from integer (INT) …

Energy and emissions of machine learning on smartphones vs. the cloud

D Patterson, JM Gilbert, M Gruteser, E Robles… - Communications of the …, 2024 - dl.acm.org
ACM: Digital Library: Communications of the ACM ACM Digital Library Communications of the
ACM Volume 67, Number 2 (2024), Pages 86-97 Energy and Emissions of Machine Learning on …

NN-LUT: Neural approximation of non-linear operations for efficient transformer inference

J Yu, J Park, S Park, M Kim, S Lee, DH Lee… - Proceedings of the 59th …, 2022 - dl.acm.org
Non-linear operations such as GELU, Layer normalization, and Soft-max are essential yet
costly building blocks of Transformer models. Several prior works simplified these …

Vegeta: Vertically-integrated extensions for sparse/dense gemm tile acceleration on cpus

G Jeong, S Damani, AR Bambhaniya… - … Symposium on High …, 2023 - ieeexplore.ieee.org
Deep Learning (DL) acceleration support in CPUs has recently gained a lot of traction, with
several companies (Arm, Intel, IBM) announcing products with specialized matrix engines …