Enabling resource-efficient aiot system with cross-level optimization: A survey

S Liu, B Guo, C Fang, Z Wang, S Luo… - … Surveys & Tutorials, 2023 - ieeexplore.ieee.org
The emerging field of artificial intelligence of things (AIoT, AI+ IoT) is driven by the
widespread use of intelligent infrastructures and the impressive success of deep learning …

Enable deep learning on mobile devices: Methods, systems, and applications

H Cai, J Lin, Y Lin, Z Liu, H Tang, H Wang… - ACM Transactions on …, 2022 - dl.acm.org
Deep neural networks (DNNs) have achieved unprecedented success in the field of artificial
intelligence (AI), including computer vision, natural language processing, and speech …

Gemmini: Enabling systematic deep-learning architecture evaluation via full-stack integration

H Genc, S Kim, A Amid, A Haj-Ali, V Iyer… - 2021 58th ACM/IEEE …, 2021 - ieeexplore.ieee.org
DNN accelerators are often developed and evaluated in isolation without considering the
cross-stack, system-level effects in real-world environments. This makes it difficult to …

An overview of sparsity exploitation in CNNs for on-device intelligence with software-hardware cross-layer optimizations

S Kang, G Park, S Kim, S Kim, D Han… - IEEE Journal on …, 2021 - ieeexplore.ieee.org
This paper presents a detailed overview of sparsity exploitation in deep neural network
(DNN) accelerators. Despite the algorithmic advancements which drove DNNs to become …

Efficient N:M Sparse DNN Training Using Algorithm, Architecture, and Dataflow Co-Design

C Fang, W Sun, A Zhou, Z Wang - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Sparse training is one of the promising techniques to reduce the computational cost of deep
neural networks (DNNs) while retaining high accuracy. In particular, N: M fine-grained …

DDC-PIM: Efficient algorithm/architecture co-design for doubling data capacity of SRAM-based processing-in-memory

C Duan, J Yang, X He, Y Qi, Y Wang… - … on Computer-Aided …, 2023 - ieeexplore.ieee.org
Processing-in-memory (PIM), as a novel computing paradigm, provides significant
performance benefits from the aspect of effective data movement reduction. SRAM-based …

Efficient-grad: Efficient training deep convolutional neural networks on edge devices with grad ient optimizations

Z Hong, CP Yue - ACM Transactions on Embedded Computing Systems …, 2022 - dl.acm.org
With the prospering of mobile devices, the distributed learning approach, enabling model
training with decentralized data, has attracted great interest from researchers. However, the …

Hw-adam: Fpga-based accelerator for adaptive moment estimation

W Zhang, L Niu, D Zhang, G Wang, FUD Farrukh… - Electronics, 2023 - mdpi.com
The selection of the optimizer is critical for convergence in the field of on-chip training. As
one second moment optimizer, adaptive moment estimation (ADAM) shows a significant …

Energy-efficient DNN training processors on micro-AI systems

D Han, S Kang, S Kim, J Lee… - IEEE Open Journal of the …, 2022 - ieeexplore.ieee.org
Many edge/mobile devices are now able to utilize deep neural networks (DNNs) thanks to
the development of mobile DNN accelerators. Mobile DNN accelerators overcame the …

THETA: A high-efficiency training accelerator for DNNs with triple-side sparsity exploration

J Lu, J Huang, Z Wang - … on Very Large Scale Integration (VLSI …, 2022 - ieeexplore.ieee.org
Training deep neural networks (DNNs) on edge devices has attracted increasing attention in
real-world applications for domain adaption and privacy protection. However, deploying …