AN5D: automated stencil framework for high-degree temporal blocking on GPUs

K Matsumura, HR Zohouri, M Wahib, T Endo… - Proceedings of the 18th …, 2020 - dl.acm.org
Stencil computation is one of the most widely-used compute patterns in high performance
computing applications. Spatial and temporal blocking have been proposed to overcome the …

swSpTRSV: A fast sparse triangular solve with sparse level tile layout on sunway architectures

X Wang, W Liu, W Xue, L Wu - Proceedings of the 23rd ACM SIGPLAN …, 2018 - dl.acm.org
Sparse triangular solve (SpTRSV) is one of the most important kernels in many real-world
applications. Currently, much research on parallel SpTRSV focuses on level-set construction …

Towards efficient spmv on sunway manycore architectures

C Liu, B **e, X Liu, W Xue, H Yang, X Liu - Proceedings of the 2018 …, 2018 - dl.acm.org
Sparse Matrix-Vector Multiplication (SpMV) is an essential computation kernel for many data-
analytic workloads running in both supercomputers and data centers. The intrinsic …

Parallelization and optimization of NSGA-II on sunway TaihuLight system

X Liu, J Sun, L Zheng, S Wang, Y Liu… - IEEE Transactions on …, 2020 - ieeexplore.ieee.org
Sunway TaihuLight system is the first supercomputer offering a peak performance over 100
PFlops, which can be utilized to parallelize Non-dominated Sorting Genetic Algorithm II …

Automatic code generation and optimization of large-scale stencil computation on many-core processors

M Li, Y Liu, H Yang, Y Hu, Q Sun, B Chen… - Proceedings of the 50th …, 2021 - dl.acm.org
Stencil computation is an indispensable building block of many scientific applications and is
widely used by the numerical solvers of partial differential equations (PDEs). Due to the …

Parallel optimization and application of unstructured sparse triangular solver on new generation of sunway architecture

J Li, L Li, Q Wang, W Xue, J Liang, J Shi - Parallel Computing, 2024 - Elsevier
Large-scale sparse linear equation solver plays an important role in both numerical
simulation and artificial intelligence, and sparse triangular equation solver is a key step in …

Exploiting temporal data reuse and asynchrony in the reverse time migration

L Qu, R Abdelkhalak, H Ltaief, I Said… - … Journal of High …, 2023 - journals.sagepub.com
Reverse Time Migration (RTM) is a state-of-the-art algorithm used in seismic depth imaging
in complex geological environments for the oil and gas exploration industry. It calculates …

LoRAStencil: Low-Rank Adaptation of Stencil Computation on Tensor Cores

Y Zhang, K Li, L Yuan, J Cheng… - … Conference for High …, 2024 - ieeexplore.ieee.org
Stencil computations play a pivotal role in numerous scientific and industrial applications,
yet their efficient execution on specialized hardware accelerators like Tensor Core Units …

Taming the" Monster": Overcoming program optimization challenges on SW26010 through precise performance modeling

S Xu, Y Xu, W Xue, X Shen, F Zheng… - 2018 IEEE …, 2018 - ieeexplore.ieee.org
This paper presents an effort for overcoming the complexities of program optimizations on
SW26010, the heterogeneous many-core processor that powers Sunway TaihuLight, the …

Bandwidth-aware loop tiling for dma-supported scratchpad memory

M Wu, Y Liu, H Cui, Q Wei, Q Li, L Li, F Lv… - Proceedings of the …, 2020 - dl.acm.org
Scratchpad Memory (SPM) is widely used in emerging domain-specific architectures and
accelerators for improving energy efficiency and time predictability. Typically, SPM-based …