swSpTRSV: A fast sparse triangular solve with sparse level tile layout on sunway architectures

X Wang, W Liu, W Xue, L Wu - Proceedings of the 23rd ACM SIGPLAN …, 2018 - dl.acm.org
Sparse triangular solve (SpTRSV) is one of the most important kernels in many real-world
applications. Currently, much research on parallel SpTRSV focuses on level-set construction …

Parallelization and optimization of NSGA-II on sunway TaihuLight system

X Liu, J Sun, L Zheng, S Wang, Y Liu… - IEEE Transactions on …, 2020 - ieeexplore.ieee.org
Sunway TaihuLight system is the first supercomputer offering a peak performance over 100
PFlops, which can be utilized to parallelize Non-dominated Sorting Genetic Algorithm II …

Accelerating DES and AES algorithms for a heterogeneous many-core processor

B **ng, DD Wang, Y Yang, Z Wei, J Wu… - International Journal of …, 2021 - Springer
Data security is the focus of information security. As a primary method, file encryption is
adopted for ensuring data security. Encryption algorithms created to meet the Data …

swcaffe: A parallel framework for accelerating deep learning applications on sunway taihulight

L Li, J Fang, H Fu, J Jiang, W Zhao… - 2018 IEEE …, 2018 - ieeexplore.ieee.org
This paper reports our efforts on swCaffe, a high-efficient parallel framework for accelerating
deep neural networks (DNNs) training on Sunway TaihuLight, one of the fastest …

Automatic multi-parameter performance modeling of HPC applications on a new sunway supercomputer

Y Zhang, Y Liu, P Jiao, Y Zhou… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
As the successor to Sunway TaihuLight, the new Sunway supercomputer has ultra-high
computing capacity, but the unique heterogeneous architecture presents performance …

ESA: An efficient sequence alignment algorithm for biological database search on Sunway TaihuLight

H Zhang, Z Huang, Y Chen, J Liang, X Gao - Parallel computing, 2023 - Elsevier
In computational biology, biological database search has been playing a very important role.
Since the COVID-19 outbreak, it has provided significant help in identifying common …

Novel parallelization of discontinuous galerkin method for transient electromagnetics simulation based on sunway supercomputers

M Li, Q Wu, Z Lin, Y Zhang… - The Applied …, 2022 - journals.riverpublishers.com
A novel parallelization of discontinuous Galerkin time-domain (DGTD) method hybrid with
the local time step (LTS) method on Sunway supercomputers for electromagnetic simulation …

Function-safe vehicular ai processor with nano core-in-memory architecture

Y Kwon, J Yang, YP Cho, KS Shin… - … Circuits and Systems …, 2019 - ieeexplore.ieee.org
State-of-the-art neural network accelerators consist of arithmetic engines organized in a
mesh structure datapath surrounded by memory blocks that provide neural data to the …

Adapting combined tiling to stencil optimizations on sunway processor

B Sun, M Li, H Yang, J Xu, Z Luan, D Qian - CCF Transactions on High …, 2023 - Springer
Stencil is one of the indispensable computation patterns in scientific applications, which is a
long-standing optimization target in the field of high performance computing (HPC). The …

The spatial computer: A model for energy-efficient parallel computation

L Gianinazzi, T Ben-Nun, M Besta, S Ashkboos… - arxiv preprint arxiv …, 2022 - arxiv.org
We present a new parallel model of computation suitable for spatial architectures, for which
the energy used for communication heavily depends on the distance of the communicating …