Reflections on 10 years of FloPoCo
F de Dinechin - 2019 IEEE 26th Symposium on Computer …, 2019 - ieeexplore.ieee.org
The FloPoCo open-source arithmetic core generator project started modestly in 2008 with a
few parametric floating point cores. It has since then evolved to become a framework for …
few parametric floating point cores. It has since then evolved to become a framework for …
On the design of logarithmic multiplier using radix-4 booth encoding
R Pilipović, P Bulić - IEEE access, 2020 - ieeexplore.ieee.org
This paper proposes an energy-efficient approximate multiplier which combines radix-4
Booth encoding and logarithmic product approximation. Additionally, a datapath pruning …
Booth encoding and logarithmic product approximation. Additionally, a datapath pruning …
Next generation arithmetic for edge computing
Arithmetic is a key component and is ubiquitous in today's digital world, ranging from
embedded to high-performance computing systems. With machine learning at the fore in a …
embedded to high-performance computing systems. With machine learning at the fore in a …
Towards globally optimal design of multipliers for FPGAs
A Böttcher, M Kumm - IEEE Transactions on Computers, 2023 - ieeexplore.ieee.org
The design of a multiplier typically consists of three steps:(1) partial product generation,(2)
compressor tree design and (3) the selection of the final adder. Conventionally, these three …
compressor tree design and (3) the selection of the final adder. Conventionally, these three …
High-efficiency Compressor Trees for Latest AMD FPGAs
High-fan-in dot product computations are ubiquitous in highly relevant application domains,
such as signal processing and machine learning. Particularly, the diverse set of data formats …
such as signal processing and machine learning. Particularly, the diverse set of data formats …
Optimizing bit-serial matrix multiplication for reconfigurable computing
Matrix-matrix multiplication is a key computational kernel for numerous applications in
science and engineering, with ample parallelism and data locality that lends itself well to …
science and engineering, with ample parallelism and data locality that lends itself well to …
Reconfigurable convolutional kernels for neural networks on FPGAs
Convolutional neural networks (CNNs) gained great success in machine learning
applications and much attention was paid to their acceleration on field programmable gate …
applications and much attention was paid to their acceleration on field programmable gate …
Karatsuba with rectangular multipliers for FPGAs
This work presents an extension of Karatsuba's method to efficiently use rectangular
multipliers as a base for larger multipliers. The rectangular multipliers that motivate this work …
multipliers as a base for larger multipliers. The rectangular multipliers that motivate this work …
On the rtl implementation of finn matrix vector unit
Field-programmable gate array (FPGA)–based accelerators are becoming increasingly
popular for deep neural network (DNN) inference due to their ability to scale performance …
popular for deep neural network (DNN) inference due to their ability to scale performance …
Low-precision logarithmic arithmetic for neural network accelerators
Resource requirements for hardware acceleration of neural networks inference is
notoriously high, both in terms of computation and storage. One way to mitigate this issue is …
notoriously high, both in terms of computation and storage. One way to mitigate this issue is …