FPGA HLS today: successes, challenges, and opportunities

J Cong, J Lau, G Liu, S Neuendorffer, P Pan… - ACM Transactions on …, 2022 - dl.acm.org
The year 2011 marked an important transition for FPGA high-level synthesis (HLS), as it
went from prototy** to deployment. A decade later, in this article, we assess the progress …

Applications and techniques for fast machine learning in science

AMC Deiana, N Tran, J Agar, M Blott… - Frontiers in big …, 2022 - frontiersin.org
In this community review report, we discuss applications and techniques for fast machine
learning (ML) in science—the concept of integrating powerful ML methods into the real-time …

{TVM}: An automated {End-to-End} optimizing compiler for deep learning

T Chen, T Moreau, Z Jiang, L Zheng, E Yan… - … USENIX Symposium on …, 2018 - usenix.org
There is an increasing need to bring machine learning to a wide diversity of hardware
devices. Current frameworks rely on vendor-specific operator libraries and optimize for a …

Deep bilateral learning for real-time image enhancement

M Gharbi, J Chen, JT Barron, SW Hasinoff… - ACM Transactions on …, 2017 - dl.acm.org
Performance is a critical challenge in mobile image processing. Given a reference imaging
pipeline, or even human-adjusted pairs of images, we seek to reproduce the enhancements …

Learning to optimize halide with tree search and random programs

A Adams, K Ma, L Anderson, R Baghdadi… - ACM Transactions on …, 2019 - dl.acm.org
We present a new algorithm to automatically schedule Halide programs for high-
performance image processing and deep learning. We significantly improve upon the …

Differentiable compound optics and processing pipeline optimization for end-to-end camera design

E Tseng, A Mosleh, F Mannan, K St-Arnaud… - ACM Transactions on …, 2021 - dl.acm.org
Most modern commodity imaging systems we use directly for photography—or indirectly rely
on for downstream applications—employ optical systems of multiple lenses that must …

Spatial: A language and compiler for application accelerators

D Koeplinger, M Feldman, R Prabhakar… - Proceedings of the 39th …, 2018 - dl.acm.org
Industry is increasingly turning to reconfigurable architectures like FPGAs and CGRAs for
improved performance and energy efficiency. Unfortunately, adoption of these architectures …

[PDF][PDF] TVM: end-to-end optimization stack for deep learning

T Chen, T Moreau, Z Jiang, H Shen… - arxiv preprint arxiv …, 2018 - dada.cs.washington.edu
Scalable frameworks, such as TensorFlow, MXNet, Caffe, and PyTorch drive the current
popularity and utility of deep learning. However, these frameworks are optimized for a …

Automatically scheduling halide image processing pipelines

RT Mullapudi, A Adams, D Sharlet… - ACM Transactions on …, 2016 - dl.acm.org
The Halide image processing language has proven to be an effective system for authoring
high-performance image processing code. Halide programmers need only provide a high …

Energy-efficient abundant-data computing: The N3XT 1,000 x

MMS Aly, M Gao, G Hills, CS Lee, G Pitner… - Computer, 2015 - ieeexplore.ieee.org
Next-generation information technologies will process unprecedented amounts of loosely
structured data that overwhelm existing computing systems. N3XT improves the energy …