The digital revolution of Earth-system science

P Bauer, PD Dueben, T Hoefler, T Quintino… - Nature Computational …, 2021 - nature.com
Computational science is crucial for delivering reliable weather and climate predictions.
However, despite decades of high-performance computing experience, there is serious …

Instead of rewriting foreign code for machine learning, automatically synthesize fast gradients

W Moses, V Churavy - Advances in neural information …, 2020 - proceedings.neurips.cc
Applying differentiable programming techniques and machine learning algorithms to foreign
programs requires developers to either rewrite their code in a machine learning framework …

Productivity, portability, performance: Data-centric Python

AN Ziogas, T Schneider, T Ben-Nun… - Proceedings of the …, 2021 - dl.acm.org
Python has become the de facto language for scientific computing. Programming in Python
is highly productive, mainly due to its rich science-oriented software ecosystem built around …

Scalable distributed high-order stencil computations

M Jacquelin, M Araya–Polo… - … Conference for High …, 2022 - ieeexplore.ieee.org
Stencil computations lie at the heart of many scientific and industrial applications. Stencil
algorithms pose several challenges on machines with cache based memory hierarchy, due …

Progressive raising in multi-level ir

L Chelini, A Drebes, O Zinenko… - 2021 IEEE/ACM …, 2021 - ieeexplore.ieee.org
Multi-level intermediate representations (IR) show great promise for lowering the design
costs for domain-specific compilers by providing a reusable, extensible, and non-opini …

Toward accelerated stencil computation by adapting tensor core unit on gpu

X Liu, Y Liu, H Yang, J Liao, M Li, Z Luan… - Proceedings of the 36th …, 2022 - dl.acm.org
The Tensor Core Unit (TCU) has been increasingly adopted on modern high performance
processors, specialized in boosting the performance of general matrix multiplication …

Automatic creation of high-bandwidth memory architectures from domain-specific languages: The case of computational fluid dynamics

S Soldavini, K Friebel, M Tibaldi, G Hempel… - ACM Transactions on …, 2023 - dl.acm.org
Numerical simulations can help solve complex problems. Most of these algorithms are
massively parallel and thus good candidates for FPGA acceleration thanks to spatial …

High-performance gpu-to-cpu transpilation and optimization via high-level parallel constructs

WS Moses, IR Ivanov, J Domke, T Endo… - Proceedings of the 28th …, 2023 - dl.acm.org
While parallelism remains the main source of performance, architectural implementations
and programming models change with each new hardware generation, often leading to …

Productive performance engineering for weather and climate modeling with python

T Ben-Nun, L Groner, F Deconinck… - … Conference for High …, 2022 - ieeexplore.ieee.org
Earth system models are developed with a tight coupling to target hardware, often
containing specialized code predicated on processor characteristics. This coupling stems …

OCC: An automated end-to-end machine learning optimizing compiler for computing-in-memory

A Siemieniuk, L Chelini, AA Khan… - … on Computer-Aided …, 2021 - ieeexplore.ieee.org
Memristive devices promise an alternative approach toward non-Von Neumann
architectures, where specific computational tasks are performed within the memory devices …