Software-defined Radios: Architecture, state-of-the-art, and challenges

R Akeela, B Dezfouli - Computer Communications, 2018 - Elsevier
Software-defined Radio (SDR) is a programmable transceiver with the capability of
operating various wireless communication protocols without the need to change or update …

Comparing energy efficiency of CPU, GPU and FPGA implementations for vision kernels

M Qasaimeh, K Denolf, J Lo, K Vissers… - … and systems (ICESS …, 2019 - ieeexplore.ieee.org
Develo** high performance embedded vision applications requires balancing run-time
performance with energy constraints. Given the mix of hardware accelerators that exist for …

Benchmarking vision kernels and neural network inference accelerators on embedded platforms

M Qasaimeh, K Denolf, A Khodamoradi, M Blott… - Journal of Systems …, 2021 - Elsevier
Develo** efficient embedded vision applications requires exploring various algorithmic
optimization trade-offs and a broad spectrum of hardware architecture choices. This makes …

Architecturally truly diverse systems: A review

RD Chamberlain - Future Generation Computer Systems, 2020 - Elsevier
The pairing of traditional multicore processors with accelerators of various forms (eg,
graphics engines, reconfigurable logic) can be referred to generally as architecturally …

Energy efficient scientific computing on FPGAs using OpenCL

D Weller, F Oboril, D Lukarski, J Becker… - Proceedings of the 2017 …, 2017 - dl.acm.org
An indispensable part of our modern life is scientific computing which is used in large-scale
high-performance systems as well as in low-power smart cyber-physical systems. Hence …

Transmuter: Bridging the efficiency gap using memory and dataflow reconfiguration

S Pal, S Feng, D Park, S Kim, A Amarnath… - Proceedings of the …, 2020 - dl.acm.org
With the end of Dennard scaling and Moore's law, it is becoming increasingly difficult to build
hardware for emerging applications that meet power and performance targets, while …

Function Placement for In-network Federated Learning

B Addis, S Boumerdassi, R Riggio, S Secci - Computer Networks, 2025 - Elsevier
Federated learning (FL), particularly when data is distributed across multiple clients, helps
reducing the learning time by avoiding training on a massive pile-up of data. Nonetheless …

A 7.3 M output non-zeros/J, 11.7 M output non-zeros/GB reconfigurable sparse matrix–matrix multiplication accelerator

DH Park, S Pal, S Feng, P Gao, J Tan… - IEEE Journal of Solid …, 2020 - ieeexplore.ieee.org
A sparse matrix-matrix multiplication (SpMM) accelerator with 48 heterogeneous cores and
a reconfigurable memory hierarchy is fabricated in 40-nm CMOS. The compute fabric …

Accelerating sparse deep neural networks on FPGAs

S Huang, C Pearson, R Nagi, J **ong… - 2019 IEEE High …, 2019 - ieeexplore.ieee.org
Deep neural networks (DNNs) have been widely adopted in many domains, including
computer vision, natural language processing, and medical care. Recent research reveals …

Early experiences migrating cuda codes to oneapi

M Costanzo, E Rucci, CG Sanchez… - arxiv preprint arxiv …, 2021 - arxiv.org
The heterogeneous computing paradigm represents a real programming challenge due to
the proliferation of devices with different hardware characteristics. Recently Intel introduced …