High-level synthesis for FPGAs: From prototy** to deployment

J Cong, B Liu, S Neuendorffer… - … on Computer-Aided …, 2011 - ieeexplore.ieee.org
Escalating system-on-chip design complexity is pushing the design community to raise the
level of abstraction beyond register transfer level. Despite the unsuccessful adoptions of …

Single-chip heterogeneous computing: Does the future include custom logic, FPGAs, and GPGPUs?

ES Chung, PA Milder, JC Hoe… - 2010 43rd annual IEEE …, 2010 - ieeexplore.ieee.org
To extend the exponential performance scaling of future chip multiprocessors, improving
energy efficiency has become a first-class priority. Single-chip heterogeneous computing …

Efficient data supply for hardware accelerators with prefetching and access/execute decoupling

T Chen, GE Suh - 2016 49th Annual IEEE/ACM International …, 2016 - ieeexplore.ieee.org
This paper presents an architecture framework to easily design hardware accelerators that
can effectively tolerate long and variable memory latency using prefetching and …

Impact of cache architecture and interface on performance and area of FPGA-based processor/parallel-accelerator systems

J Choi, K Nam, A Canis, J Anderson… - 2012 IEEE 20th …, 2012 - ieeexplore.ieee.org
We describe new multi-ported cache designs suitable for use in FPGA-based
processor/parallel-accelerator systems, and evaluate their impact on application …

TAPAS: Generating parallel accelerators from parallel programs

S Margerm, A Sharifian, A Guha… - 2018 51st Annual …, 2018 - ieeexplore.ieee.org
High-level-synthesis (HLS) tools generate accelerators from software programs to ease the
task of building hardware. Unfortunately, current HLS tools have limited support for …

Fusion: Design tradeoffs in coherent cache hierarchies for accelerators

S Kumar, A Shriraman, N Vedula - Proceedings of the 42Nd Annual …, 2015 - dl.acm.org
Chip designers have shown increasing interest in integrating specialized fixed-function
coprocessors into multicore designs to improve energy efficiency. Recent work in academia …

SOFF: An OpenCL high-level synthesis framework for FPGAs

G Jo, H Kim, J Lee, J Lee - 2020 ACM/IEEE 47th Annual …, 2020 - ieeexplore.ieee.org
Recently, OpenCL has been emerging as a programming model for energy-efficient FPGA
accelerators. However, the state-of-the-art OpenCL frameworks for FPGAs suffer from poor …

ASTRO: Synthesizing application-specific reconfigurable hardware traces to exploit memory-level parallelism

M Lin, S Chen, RF DeMara, J Wawrzynek - Microprocessors and …, 2015 - Elsevier
Emerging integrated CPU+ FPGA hybrid platforms, such as the Extensible Processing
Platform architecture from **linx [1], offer unprecedented opportunity to achieving both …

MATCHUP: Memory abstractions for heap manipulating programs

F Winterstein, K Fleming, HJ Yang, S Bayliss… - Proceedings of the …, 2015 - dl.acm.org
Memory-intensive implementations often require access to an external, off-chip memory
which can substantially slow down an FPGA accelerator due to memory bandwidth …

Efficient complex operators for irregular codes

J Sampson, G Venkatesh… - 2011 IEEE 17th …, 2011 - ieeexplore.ieee.org
Complex “fat operators” are important contributors to the efficiency of specialized hardware.
This paper introduces two new techniques for constructing efficient fat operators featuring up …