swSpTRSV: A fast sparse triangular solve with sparse level tile layout on sunway architectures

X Wang, W Liu, W Xue, L Wu - Proceedings of the 23rd ACM SIGPLAN …, 2018 - dl.acm.org
Sparse triangular solve (SpTRSV) is one of the most important kernels in many real-world
applications. Currently, much research on parallel SpTRSV focuses on level-set construction …

Parallelization and optimization of NSGA-II on sunway TaihuLight system

X Liu, J Sun, L Zheng, S Wang, Y Liu… - IEEE Transactions on …, 2020 - ieeexplore.ieee.org
Sunway TaihuLight system is the first supercomputer offering a peak performance over 100
PFlops, which can be utilized to parallelize Non-dominated Sorting Genetic Algorithm II …

Parallel optimization and application of unstructured sparse triangular solver on new generation of sunway architecture

J Li, L Li, Q Wang, W Xue, J Liang, J Shi - Parallel Computing, 2024 - Elsevier
Large-scale sparse linear equation solver plays an important role in both numerical
simulation and artificial intelligence, and sparse triangular equation solver is a key step in …

Increasing the efficiency of massively parallel sparse matrix-matrix multiplication in first-principles calculation on the new-generation Sunway supercomputer

X Chen, Y Gao, H Shang, F Li, Z Xu… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
The first-principles approach based on density-functional theory (DFT)/density-functional
perturbation theory (DFPT) is widely used in calculations of the systems' ground state …

[HTML][HTML] Modified fast inverse square root and square root approximation algorithms: The method of switching magic constants

LV Moroz, VV Samotyy, OY Horyachyy - Computation, 2021 - mdpi.com
Many low-cost platforms that support floating-point arithmetic, such as microcontrollers and
field-programmable gate arrays, do not include fast hardware or software methods for …

Evaluating the SW26010 many-core processor with a micro-benchmark suite for performance optimizations

J Lin, Z Xu, L Cai, A Nukada, S Matsuoka - Parallel Computing, 2018 - Elsevier
The inadequate public information of China's SW26010 processor's micro-architecture
prevents global researchers from improving application performances on the TaihuLight …

Enabling highly efficient batched matrix multiplications on SW26010 many-core processor

L Jiang, C Yang, W Ma - ACM Transactions on Architecture and Code …, 2020 - dl.acm.org
We present a systematic methodology for optimizing batched matrix multiplications on
SW26010 many-core processor of the Sunway TaihuLight supercomputer. Five surrogate …

[HTML][HTML] A modification of the fast inverse square root algorithm

CJ Walczyk, LV Moroz, JL Cieśliński - Computation, 2019 - mdpi.com
We present a new algorithm for the approximate evaluation of the inverse square root for
single-precision floating-point numbers. This is a modification of the famous fast inverse …

Algorithms for calculating the square root and inverse square root based on the second-order householder's method

L Moroz, V Samotyy, O Horyachyy… - 2019 10th IEEE …, 2019 - ieeexplore.ieee.org
This article proposes a set of algorithms for calculating the square root and inverse square
root for normalized single and double precision floating-point numbers. They are based on …

Efficient floating-point square root and reciprocal square root algorithms

L Moroz, V Samotyy, M Węgrzyn… - 2021 11th IEEE …, 2021 - ieeexplore.ieee.org
Several algorithms for calculating square roots and inverse square roots are developed.
These are oriented on normalized numbers with a floating point for single and double …