A survey on federated learning for resource-constrained IoT devices

A Imteaj, U Thakker, S Wang, J Li… - IEEE Internet of Things …, 2021‏ - ieeexplore.ieee.org
Federated learning (FL) is a distributed machine learning strategy that generates a global
model by learning from multiple decentralized edge clients. FL enables on-device training …

A survey on efficient convolutional neural networks and hardware acceleration

D Ghimire, D Kil, S Kim - Electronics, 2022‏ - mdpi.com
Over the past decade, deep-learning-based representations have demonstrated remarkable
performance in academia and industry. The learning capability of convolutional neural …

Hardware architecture and software stack for PIM based on commercial DRAM technology: Industrial product

S Lee, S Kang, J Lee, H Kim, E Lee… - 2021 ACM/IEEE 48th …, 2021‏ - ieeexplore.ieee.org
Emerging applications such as deep neural network demand high off-chip memory
bandwidth. However, under stringent physical constraints of chip packages and system …

Sigma: A sparse and irregular gemm accelerator with flexible interconnects for dnn training

E Qin, A Samajdar, H Kwon, V Nadella… - … Symposium on High …, 2020‏ - ieeexplore.ieee.org
The advent of Deep Learning (DL) has radically transformed the computing industry across
the entire spectrum from algorithms to circuits. As myriad application domains embrace DL, it …

Machine learning at facebook: Understanding inference at the edge

CJ Wu, D Brooks, K Chen, D Chen… - … symposium on high …, 2019‏ - ieeexplore.ieee.org
At Facebook, machine learning provides a wide range of capabilities that drive many
aspects of user experience including ranking posts, content understanding, object detection …

Benchmarking a new paradigm: Experimental analysis and characterization of a real processing-in-memory system

J Gómez-Luna, I El Hajj, I Fernandez… - IEEE …, 2022‏ - ieeexplore.ieee.org
Many modern workloads, such as neural networks, databases, and graph processing, are
fundamentally memory-bound. For such workloads, the data movement between main …

A configurable cloud-scale DNN processor for real-time AI

J Fowers, K Ovtcharov, M Papamichael… - 2018 ACM/IEEE 45th …, 2018‏ - ieeexplore.ieee.org
Interactive AI-powered services require low-latency evaluation of deep neural network
(DNN) models-aka"" real-time AI"". The growing demand for computationally expensive …

In-datacenter performance analysis of a tensor processing unit

NP Jouppi, C Young, N Patil, D Patterson… - Proceedings of the 44th …, 2017‏ - dl.acm.org
Many architects believe that major improvements in cost-energy-performance must now
come from domain-specific hardware. This paper evaluates a custom ASIC---called a Tensor …

PUMA: A programmable ultra-efficient memristor-based accelerator for machine learning inference

A Ankit, IE Hajj, SR Chalamalasetti, G Ndu… - Proceedings of the …, 2019‏ - dl.acm.org
Memristor crossbars are circuits capable of performing analog matrix-vector multiplications,
overcoming the fundamental energy efficiency limitations of digital logic. They have been …

Efficient processing of deep neural networks: A tutorial and survey

V Sze, YH Chen, TJ Yang, JS Emer - Proceedings of the IEEE, 2017‏ - ieeexplore.ieee.org
Deep neural networks (DNNs) are currently widely used for many artificial intelligence (AI)
applications including computer vision, speech recognition, and robotics. While DNNs …