High-level synthesis for FPGAs: From prototy** to deployment
Escalating system-on-chip design complexity is pushing the design community to raise the
level of abstraction beyond register transfer level. Despite the unsuccessful adoptions of …
level of abstraction beyond register transfer level. Despite the unsuccessful adoptions of …
Single-chip heterogeneous computing: Does the future include custom logic, FPGAs, and GPGPUs?
To extend the exponential performance scaling of future chip multiprocessors, improving
energy efficiency has become a first-class priority. Single-chip heterogeneous computing …
energy efficiency has become a first-class priority. Single-chip heterogeneous computing …
Efficient data supply for hardware accelerators with prefetching and access/execute decoupling
This paper presents an architecture framework to easily design hardware accelerators that
can effectively tolerate long and variable memory latency using prefetching and …
can effectively tolerate long and variable memory latency using prefetching and …
Impact of cache architecture and interface on performance and area of FPGA-based processor/parallel-accelerator systems
We describe new multi-ported cache designs suitable for use in FPGA-based
processor/parallel-accelerator systems, and evaluate their impact on application …
processor/parallel-accelerator systems, and evaluate their impact on application …
TAPAS: Generating parallel accelerators from parallel programs
High-level-synthesis (HLS) tools generate accelerators from software programs to ease the
task of building hardware. Unfortunately, current HLS tools have limited support for …
task of building hardware. Unfortunately, current HLS tools have limited support for …
Fusion: Design tradeoffs in coherent cache hierarchies for accelerators
Chip designers have shown increasing interest in integrating specialized fixed-function
coprocessors into multicore designs to improve energy efficiency. Recent work in academia …
coprocessors into multicore designs to improve energy efficiency. Recent work in academia …
SOFF: An OpenCL high-level synthesis framework for FPGAs
Recently, OpenCL has been emerging as a programming model for energy-efficient FPGA
accelerators. However, the state-of-the-art OpenCL frameworks for FPGAs suffer from poor …
accelerators. However, the state-of-the-art OpenCL frameworks for FPGAs suffer from poor …
ASTRO: Synthesizing application-specific reconfigurable hardware traces to exploit memory-level parallelism
Emerging integrated CPU+ FPGA hybrid platforms, such as the Extensible Processing
Platform architecture from **linx [1], offer unprecedented opportunity to achieving both …
Platform architecture from **linx [1], offer unprecedented opportunity to achieving both …
MATCHUP: Memory abstractions for heap manipulating programs
Memory-intensive implementations often require access to an external, off-chip memory
which can substantially slow down an FPGA accelerator due to memory bandwidth …
which can substantially slow down an FPGA accelerator due to memory bandwidth …
Efficient complex operators for irregular codes
Complex “fat operators” are important contributors to the efficiency of specialized hardware.
This paper introduces two new techniques for constructing efficient fat operators featuring up …
This paper introduces two new techniques for constructing efficient fat operators featuring up …