An efficient hardware supported and parallelization architecture for intelligent systems to overcome speculative overheads
In the last few decades, technology advancements have paved the way for the creation of
intelligent and autonomous systems that utilize complex calculations which are both time …
intelligent and autonomous systems that utilize complex calculations which are both time …
Scalehls: A new scalable high-level synthesis framework on multi-level intermediate representation
High-level synthesis (HLS) has been widely adopted as it significantly improves the
hardware design productivity and enables efficient design space exploration (DSE). Existing …
hardware design productivity and enables efficient design space exploration (DSE). Existing …
Patus: A code generation and autotuning framework for parallel iterative stencil computations on modern microarchitectures
Stencil calculations comprise an important class of kernels in many scientific computing
applications ranging from simple PDE solvers to constituent kernels in multigrid methods as …
applications ranging from simple PDE solvers to constituent kernels in multigrid methods as …
The cetus source-to-source compiler infrastructure: overview and evaluation
This paper provides an overview and an evaluation of the Cetus source-to-source compiler
infrastructure. The original goal of the Cetus project was to create an easy-to-use compiler …
infrastructure. The original goal of the Cetus project was to create an easy-to-use compiler …
OpenMPC: Extended OpenMP programming and tuning for GPUs
General-Purpose Graphics Processing Units (GPGPUs) are promising parallel platforms for
high performance computing. The CUDA (Compute Unified Device Architecture) …
high performance computing. The CUDA (Compute Unified Device Architecture) …
Compass: A framework for automated performance modeling and prediction
Flexible, accurate performance predictions offer numerous benefits such as gaining insight
into and optimizing applications and architectures. However, the development and …
into and optimizing applications and architectures. However, the development and …
Openacc to fpga: A framework for directive-based high-performance reconfigurable computing
This paper presents a directive-based, high-level programming framework for high-
performance reconfigurable computing. It takes a standard, portable OpenACC C program …
performance reconfigurable computing. It takes a standard, portable OpenACC C program …
Hauberk: Lightweight silent data corruption error detector for gpgpu
High performance and relatively low cost of GPU-based platforms provide an attractive
alternative for general purpose high performance computing (HPC). However, the emerging …
alternative for general purpose high performance computing (HPC). However, the emerging …
Openarc: Open accelerator research compiler for directive-based, efficient heterogeneous computing
This paper presents Open Accelerator Research Compiler (OpenARC): an open-source
framework that supports the full feature set of OpenACC V1. 0 and performs source-to …
framework that supports the full feature set of OpenACC V1. 0 and performs source-to …
Advising openmp parallelization via a graph-based approach with transformers
There is an ever-present need for shared memory parallelization schemes to exploit the full
potential of multi-core architectures. The most common parallelization API addressing this …
potential of multi-core architectures. The most common parallelization API addressing this …