High performance stencil code generation with lift

B Hagedorn, L Stoltzfus, M Steuwer… - Proceedings of the …, 2018‏ - dl.acm.org
Stencil computations are widely used from physical simulations to machine-learning. They
are embarrassingly parallel and perfectly fit modern hardware such as Graphic Processing …

Hybrid hexagonal/classical tiling for GPUs

T Grosser, A Cohen, J Holewinski… - Proceedings of Annual …, 2014‏ - dl.acm.org
Time-tiling is necessary for the efficient execution of iterative stencil computations. Classical
hyper-rectangular tiles cannot be used due to the combination of backward and forward …

Verified lifting of stencil computations

S Kamil, A Cheung, S Itzhaky, A Solar-Lezama - ACM SIGPLAN Notices, 2016‏ - dl.acm.org
This paper demonstrates a novel combination of program synthesis and verification to lift
stencil computations from low-level Fortran code to a high-level summary expressed using a …

AN5D: automated stencil framework for high-degree temporal blocking on GPUs

K Matsumura, HR Zohouri, M Wahib, T Endo… - Proceedings of the 18th …, 2020‏ - dl.acm.org
Stencil computation is one of the most widely-used compute patterns in high performance
computing applications. Spatial and temporal blocking have been proposed to overcome the …

Multicore-optimized wavefront diamond blocking for optimizing stencil updates

T Malas, G Hager, H Ltaief, H Stengel, G Wellein… - SIAM Journal on …, 2015‏ - SIAM
The importance of stencil-based algorithms in computational science has focused attention
on optimized parallel implementations for multilevel cache-based processors. Temporal …

Diamond tiling: Tiling techniques to maximize parallelism for stencil computations

U Bondhugula, V Bandishti… - IEEE Transactions on …, 2016‏ - ieeexplore.ieee.org
Most stencil computations allow tile-wise concurrent start, ie, there always exists a face of
the iteration space and a set of tiling directions such that all tiles along that face can be …

Loop tiling in large-scale stencil codes at run-time with OPS

IZ Reguly, GR Mudalige… - IEEE Transactions on …, 2017‏ - ieeexplore.ieee.org
The key common bottleneck in most stencil codes is data movement, and prior research has
shown that improving data locality through optimisations that optimise across loops do …

Multidimensional intratile parallelization for memory-starved stencil computations

TM Malas, G Hager, H Ltaief, DE Keyes - ACM Transactions on Parallel …, 2017‏ - dl.acm.org
Optimizing the performance of stencil algorithms has been the subject of intense research
over the last two decades. Since many stencil schemes have low arithmetic intensity, most …

Flextended tiles: A flexible extension of overlapped tiles for polyhedral compilation

J Zhao, A Cohen - ACM Transactions on Architecture and Code …, 2019‏ - dl.acm.org
Loop tiling to exploit data locality and parallelism plays an essential role in a variety of
general-purpose and domain-specific compilers. Affine transformations in polyhedral …

Exploiting temporal data reuse and asynchrony in the reverse time migration

L Qu, R Abdelkhalak, H Ltaief, I Said… - … Journal of High …, 2023‏ - journals.sagepub.com
Reverse Time Migration (RTM) is a state-of-the-art algorithm used in seismic depth imaging
in complex geological environments for the oil and gas exploration industry. It calculates …