LLVM-based automation of memory decoupling for OpenCL applications on FPGAs

AA Purkayastha, S Rogers, SA Shiddibhavi… - Microprocessors and …, 2020 - Elsevier
Abstract The availability of OpenCL High-Level Synthesis (OpenCL-HLS) has made FPGAs
an attractive platform for power-efficient high-performance execution of massively parallel …

Exploring the efficiency of opencl pipe for hiding memory latency on cloud fpgas

AA Purkayastha, S Raghavendran… - 2019 IEEE High …, 2019 - ieeexplore.ieee.org
OpenCL programming ability combined with OpenCL High-Level Synthesis (OpenCL-HLS)
tools have made tremendous improvements in the reconfigurable computing field. FPGAs …

Sb-fetch: Synchronization aware hardware prefetching for chip multiprocessors

LM AlBarakat, PV Gratz, DA Jiménez - Proceedings of the 34th ACM …, 2020 - dl.acm.org
Shared-memory, multi-threaded applications often require programmers to insert thread
synchronization primitives (ie locks, barriers, and condition variables) in critical sections to …

Leveraging LLVM IR for Design Space Exploration and Modeling of Application and Domain-Specific Hardware

S Rogers - 2022 - search.proquest.com
The limitations of transistor scaling and the rise of power-constrained compute environments
have driven a new renaissance in hardware design through hardware acceleration …

XStream: Cross-core spatial streaming based MLC prefetchers for parallel applications in CMPs

B Panda, S Balachandran - … of the 23rd international conference on …, 2014 - dl.acm.org
Hardware prefetchers are commonly used to hide and tolerate off-chip memory latency.
Prefetching techniques in the literature are designed for multiple independent sequential …

Empowering FPGAS for massively parallel applications

SA Shiddibhavi - 2018 - search.proquest.com
Abstract The availability of OpenCL High-Level Synthesis (OpenCL-HLS) has made FPGAs
an attractive platform for power-efficient high-performance execution of massively parallel …

Speculative Techniques for Memory Hierarchy Management

LM Albarakat - 2021 - search.proquest.com
Abstract The “Memory Wall”, is the gap in performance between the processor and the main
memory. Over the last 30 years computer architects have added multiple levels of cache to …

Empowering Reconfigurable Platforms for Massively Parallel Applications

AA Purkayastha - 2021 - search.proquest.com
The availability of OpenCL for FPGAs along with High-Level Synthesis tools have made
them an attractive platform for implementing compute-intensive massively parallel …

Prefetch strategy control for parallel execution of threads based on one or more characteristics of a stream of program instructions indicative that a data access …

GS Dasika, R Holm, DH Mansell - US Patent 11,494,188, 2022 - Google Patents
A single instruction multiple thread (SIMT) processor includes execution circuitry, prefetch
circuitry and prefetch strategy selection circuitry. The prefetch strategy selection circuitry …