[HTML][HTML] Applications and techniques for fast machine learning in science
In this community review report, we discuss applications and techniques for fast machine
learning (ML) in science—the concept of integrating powerful ML methods into the real-time …
learning (ML) in science—the concept of integrating powerful ML methods into the real-time …
Vision-based autonomous bolt-looseness detection method for splice connections: Design, lab-scale evaluation, and field application
TC Huynh - Automation in Construction, 2021 - Elsevier
This study presents a novel autonomous vision-based bolt-looseness detection method for
splice bolted connections. The method is sequentially designed with a Faster regional …
splice bolted connections. The method is sequentially designed with a Faster regional …
Aha: An agile approach to the design of coarse-grained reconfigurable accelerators and compilers
With the slowing of Moore's law, computer architects have turned to domain-specific
hardware specialization to continue improving the performance and efficiency of computing …
hardware specialization to continue improving the performance and efficiency of computing …
Marvel: A data-centric approach for map** deep learning operators on spatial accelerators
A spatial accelerator's efficiency depends heavily on both its mapper and cost models to
generate optimized map**s for various operators of DNN models. However, existing cost …
generate optimized map**s for various operators of DNN models. However, existing cost …
Unified buffer: Compiling image processing and machine learning applications to push-memory accelerators
Image processing and machine learning applications benefit tremendously from hardware
acceleration. Existing compilers target either FPGAs, which sacrifice power and performance …
acceleration. Existing compilers target either FPGAs, which sacrifice power and performance …
An Efficient Hybrid Deep Learning Accelerator for Compact and Heterogeneous CNNs
Resource-efficient Convolutional Neural Networks (CNNs) are gaining more attention.
These CNNs have relatively low computational and memory requirements. A common …
These CNNs have relatively low computational and memory requirements. A common …
[HTML][HTML] Quantune: Post-training quantization of convolutional neural networks using extreme gradient boosting for fast deployment
To adopt convolutional neural networks (CNN) for a range of resource-constrained targets, it
is necessary to compress the CNN models by performing quantization, whereby precision …
is necessary to compress the CNN models by performing quantization, whereby precision …
Tensorflow to cloud FPGAs: Tradeoffs for accelerating deep neural networks
We present the first open-source TensorFlow to FPGA tool capable of running state-of-the-
art DNNs. Running TensorFlow on the Amazon cloud FPGA instances, we provide …
art DNNs. Running TensorFlow on the Amazon cloud FPGA instances, we provide …
Fibha: fixed budget hybrid CNN accelerator
Seeking the “sweet spot” in the accuracy-efficiency trade-off is increasing the heterogeneity
of state-of-the-art Convolutional Neural Networks (CNNs). Such CNN models exhibit …
of state-of-the-art Convolutional Neural Networks (CNNs). Such CNN models exhibit …
Transparent compiler and runtime specializations for accelerating managed languages on fpgas
In recent years, heterogeneous computing has emerged as the vital way to increase
computers? performance and energy efficiency by combining diverse hardware devices …
computers? performance and energy efficiency by combining diverse hardware devices …