AN5D: automated stencil framework for high-degree temporal blocking on GPUs
Stencil computation is one of the most widely-used compute patterns in high performance
computing applications. Spatial and temporal blocking have been proposed to overcome the …
computing applications. Spatial and temporal blocking have been proposed to overcome the …
swSpTRSV: A fast sparse triangular solve with sparse level tile layout on sunway architectures
Sparse triangular solve (SpTRSV) is one of the most important kernels in many real-world
applications. Currently, much research on parallel SpTRSV focuses on level-set construction …
applications. Currently, much research on parallel SpTRSV focuses on level-set construction …
Towards efficient spmv on sunway manycore architectures
Sparse Matrix-Vector Multiplication (SpMV) is an essential computation kernel for many data-
analytic workloads running in both supercomputers and data centers. The intrinsic …
analytic workloads running in both supercomputers and data centers. The intrinsic …
Parallelization and optimization of NSGA-II on sunway TaihuLight system
X Liu, J Sun, L Zheng, S Wang, Y Liu… - IEEE Transactions on …, 2020 - ieeexplore.ieee.org
Sunway TaihuLight system is the first supercomputer offering a peak performance over 100
PFlops, which can be utilized to parallelize Non-dominated Sorting Genetic Algorithm II …
PFlops, which can be utilized to parallelize Non-dominated Sorting Genetic Algorithm II …
Automatic code generation and optimization of large-scale stencil computation on many-core processors
Stencil computation is an indispensable building block of many scientific applications and is
widely used by the numerical solvers of partial differential equations (PDEs). Due to the …
widely used by the numerical solvers of partial differential equations (PDEs). Due to the …
Parallel optimization and application of unstructured sparse triangular solver on new generation of sunway architecture
Large-scale sparse linear equation solver plays an important role in both numerical
simulation and artificial intelligence, and sparse triangular equation solver is a key step in …
simulation and artificial intelligence, and sparse triangular equation solver is a key step in …
Exploiting temporal data reuse and asynchrony in the reverse time migration
Reverse Time Migration (RTM) is a state-of-the-art algorithm used in seismic depth imaging
in complex geological environments for the oil and gas exploration industry. It calculates …
in complex geological environments for the oil and gas exploration industry. It calculates …
LoRAStencil: Low-Rank Adaptation of Stencil Computation on Tensor Cores
Stencil computations play a pivotal role in numerous scientific and industrial applications,
yet their efficient execution on specialized hardware accelerators like Tensor Core Units …
yet their efficient execution on specialized hardware accelerators like Tensor Core Units …
Taming the" Monster": Overcoming program optimization challenges on SW26010 through precise performance modeling
This paper presents an effort for overcoming the complexities of program optimizations on
SW26010, the heterogeneous many-core processor that powers Sunway TaihuLight, the …
SW26010, the heterogeneous many-core processor that powers Sunway TaihuLight, the …
Bandwidth-aware loop tiling for dma-supported scratchpad memory
M Wu, Y Liu, H Cui, Q Wei, Q Li, L Li, F Lv… - Proceedings of the …, 2020 - dl.acm.org
Scratchpad Memory (SPM) is widely used in emerging domain-specific architectures and
accelerators for improving energy efficiency and time predictability. Typically, SPM-based …
accelerators for improving energy efficiency and time predictability. Typically, SPM-based …