An efficient hardware supported and parallelization architecture for intelligent systems to overcome speculative overheads

S Kumar, SK Singh, N Aggarwal… - … Journal of Intelligent …, 2022 - Wiley Online Library
In the last few decades, technology advancements have paved the way for the creation of
intelligent and autonomous systems that utilize complex calculations which are both time …

Scalehls: A new scalable high-level synthesis framework on multi-level intermediate representation

H Ye, C Hao, J Cheng, H Jeong… - … symposium on high …, 2022 - ieeexplore.ieee.org
High-level synthesis (HLS) has been widely adopted as it significantly improves the
hardware design productivity and enables efficient design space exploration (DSE). Existing …

Patus: A code generation and autotuning framework for parallel iterative stencil computations on modern microarchitectures

M Christen, O Schenk, H Burkhart - 2011 IEEE International …, 2011 - ieeexplore.ieee.org
Stencil calculations comprise an important class of kernels in many scientific computing
applications ranging from simple PDE solvers to constituent kernels in multigrid methods as …

The cetus source-to-source compiler infrastructure: overview and evaluation

H Bae, D Mustafa, JW Lee, Aurangzeb, H Lin… - International Journal of …, 2013 - Springer
This paper provides an overview and an evaluation of the Cetus source-to-source compiler
infrastructure. The original goal of the Cetus project was to create an easy-to-use compiler …

OpenMPC: Extended OpenMP programming and tuning for GPUs

S Lee, R Eigenmann - SC'10: Proceedings of the 2010 ACM …, 2010 - ieeexplore.ieee.org
General-Purpose Graphics Processing Units (GPGPUs) are promising parallel platforms for
high performance computing. The CUDA (Compute Unified Device Architecture) …

Compass: A framework for automated performance modeling and prediction

S Lee, JS Meredith, JS Vetter - Proceedings of the 29th ACM on …, 2015 - dl.acm.org
Flexible, accurate performance predictions offer numerous benefits such as gaining insight
into and optimizing applications and architectures. However, the development and …

Openacc to fpga: A framework for directive-based high-performance reconfigurable computing

S Lee, J Kim, JS Vetter - 2016 IEEE International Parallel and …, 2016 - ieeexplore.ieee.org
This paper presents a directive-based, high-level programming framework for high-
performance reconfigurable computing. It takes a standard, portable OpenACC C program …

Hauberk: Lightweight silent data corruption error detector for gpgpu

KS Yim, C Pham, M Saleheen… - … Parallel & Distributed …, 2011 - ieeexplore.ieee.org
High performance and relatively low cost of GPU-based platforms provide an attractive
alternative for general purpose high performance computing (HPC). However, the emerging …

Openarc: Open accelerator research compiler for directive-based, efficient heterogeneous computing

S Lee, JS Vetter - Proceedings of the 23rd international symposium on …, 2014 - dl.acm.org
This paper presents Open Accelerator Research Compiler (OpenARC): an open-source
framework that supports the full feature set of OpenACC V1. 0 and performs source-to …

Advising openmp parallelization via a graph-based approach with transformers

T Kadosh, N Schneider, N Hasabnis, T Mattson… - … Workshop on OpenMP, 2023 - Springer
There is an ever-present need for shared memory parallelization schemes to exploit the full
potential of multi-core architectures. The most common parallelization API addressing this …