FPGA HLS today: successes, challenges, and opportunities

J Cong, J Lau, G Liu, S Neuendorffer, P Pan… - ACM Transactions on …, 2022 - dl.acm.org
The year 2011 marked an important transition for FPGA high-level synthesis (HLS), as it
went from prototy** to deployment. A decade later, in this article, we assess the progress …

The future of FPGA acceleration in datacenters and the cloud

C Bobda, JM Mbongue, P Chow, M Ewais… - ACM Transactions on …, 2022 - dl.acm.org
In this article, we survey existing academic and commercial efforts to provide Field-
Programmable Gate Array (FPGA) acceleration in datacenters and the cloud. The goal is a …

Ansor: Generating {High-Performance} tensor programs for deep learning

L Zheng, C Jia, M Sun, Z Wu, CH Yu, A Haj-Ali… - … USENIX symposium on …, 2020 - usenix.org
High-performance tensor programs are crucial to guarantee efficient execution of deep
neural networks. However, obtaining performant tensor programs for different operators on …

Gemmini: Enabling systematic deep-learning architecture evaluation via full-stack integration

H Genc, S Kim, A Amid, A Haj-Ali, V Iyer… - 2021 58th ACM/IEEE …, 2021 - ieeexplore.ieee.org
DNN accelerators are often developed and evaluated in isolation without considering the
cross-stack, system-level effects in real-world environments. This makes it difficult to …

Scalehls: A new scalable high-level synthesis framework on multi-level intermediate representation

H Ye, C Hao, J Cheng, H Jeong… - … symposium on high …, 2022 - ieeexplore.ieee.org
High-level synthesis (HLS) has been widely adopted as it significantly improves the
hardware design productivity and enables efficient design space exploration (DSE). Existing …

Tensorir: An abstraction for automatic tensorized program optimization

S Feng, B Hou, H **, W Lin, J Shao, R Lai… - Proceedings of the 28th …, 2023 - dl.acm.org
Deploying deep learning models on various devices has become an important topic. The
wave of hardware specialization brings a diverse set of acceleration primitives for multi …

A tinyml platform for on-device continual learning with quantized latent replays

L Ravaglia, M Rusci, D Nadalini… - IEEE Journal on …, 2021 - ieeexplore.ieee.org
In the last few years, research and development on Deep Learning models & techniques for
ultra-low-power devices–in a word, TinyML–has mainly focused on a train-then-deploy …

Mix and match: A novel fpga-centric deep neural network quantization framework

SE Chang, Y Li, M Sun, R Shi, HKH So… - … Symposium on High …, 2021 - ieeexplore.ieee.org
Deep Neural Networks (DNNs) have achieved extraordinary performance in various
application domains. To support diverse DNN models, efficient implementations of DNN …

Hasco: Towards agile hardware and software co-design for tensor computation

Q **ao, S Zheng, B Wu, P Xu, X Qian… - 2021 ACM/IEEE 48th …, 2021 - ieeexplore.ieee.org
Tensor computations overwhelm traditional general-purpose computing devices due to the
large amounts of data and operations of the computations. They call for a holistic solution …

Hardware acceleration of sparse and irregular tensor computations of ml models: A survey and insights

S Dave, R Baghdadi, T Nowatzki… - Proceedings of the …, 2021 - ieeexplore.ieee.org
Machine learning (ML) models are widely used in many important domains. For efficiently
processing these computational-and memory-intensive applications, tensors of these …