High performance stencil code generation with lift
Stencil computations are widely used from physical simulations to machine-learning. They
are embarrassingly parallel and perfectly fit modern hardware such as Graphic Processing …
are embarrassingly parallel and perfectly fit modern hardware such as Graphic Processing …
Hybrid hexagonal/classical tiling for GPUs
Time-tiling is necessary for the efficient execution of iterative stencil computations. Classical
hyper-rectangular tiles cannot be used due to the combination of backward and forward …
hyper-rectangular tiles cannot be used due to the combination of backward and forward …
Verified lifting of stencil computations
This paper demonstrates a novel combination of program synthesis and verification to lift
stencil computations from low-level Fortran code to a high-level summary expressed using a …
stencil computations from low-level Fortran code to a high-level summary expressed using a …
AN5D: automated stencil framework for high-degree temporal blocking on GPUs
Stencil computation is one of the most widely-used compute patterns in high performance
computing applications. Spatial and temporal blocking have been proposed to overcome the …
computing applications. Spatial and temporal blocking have been proposed to overcome the …
Multicore-optimized wavefront diamond blocking for optimizing stencil updates
The importance of stencil-based algorithms in computational science has focused attention
on optimized parallel implementations for multilevel cache-based processors. Temporal …
on optimized parallel implementations for multilevel cache-based processors. Temporal …
Diamond tiling: Tiling techniques to maximize parallelism for stencil computations
Most stencil computations allow tile-wise concurrent start, ie, there always exists a face of
the iteration space and a set of tiling directions such that all tiles along that face can be …
the iteration space and a set of tiling directions such that all tiles along that face can be …
Loop tiling in large-scale stencil codes at run-time with OPS
The key common bottleneck in most stencil codes is data movement, and prior research has
shown that improving data locality through optimisations that optimise across loops do …
shown that improving data locality through optimisations that optimise across loops do …
Multidimensional intratile parallelization for memory-starved stencil computations
Optimizing the performance of stencil algorithms has been the subject of intense research
over the last two decades. Since many stencil schemes have low arithmetic intensity, most …
over the last two decades. Since many stencil schemes have low arithmetic intensity, most …
Flextended tiles: A flexible extension of overlapped tiles for polyhedral compilation
Loop tiling to exploit data locality and parallelism plays an essential role in a variety of
general-purpose and domain-specific compilers. Affine transformations in polyhedral …
general-purpose and domain-specific compilers. Affine transformations in polyhedral …
Exploiting temporal data reuse and asynchrony in the reverse time migration
Reverse Time Migration (RTM) is a state-of-the-art algorithm used in seismic depth imaging
in complex geological environments for the oil and gas exploration industry. It calculates …
in complex geological environments for the oil and gas exploration industry. It calculates …