Extreme-scale ab initio quantum raman spectra simulations on the leadership HPC system in China

H Shang, F Li, Y Zhang, L Zhang, Y Fu, Y Gao… - Proceedings of the …, 2021 - dl.acm.org
Raman spectroscopy provides chemical and compositional information that can serve as a
structural fingerprint for various materials. Therefore, simulations of Raman spectra …

Parallelization and optimization of NSGA-II on sunway TaihuLight system

X Liu, J Sun, L Zheng, S Wang, Y Liu… - IEEE Transactions on …, 2020 - ieeexplore.ieee.org
Sunway TaihuLight system is the first supercomputer offering a peak performance over 100
PFlops, which can be utilized to parallelize Non-dominated Sorting Genetic Algorithm II …

Optimizing small channel 3D convolution on GPU with tensor core

J Jiang, D Huang, J Du, Y Lu, X Liao - Parallel Computing, 2022 - Elsevier
In many scenarios, particularly scientific AI applications, algorithm engineers widely adopt
more complex convolution, eg 3D CNN, to improve the accuracy. Scientific AI applications …

YaConv: Convolution with low cache footprint

I Korostelev, JP L. De Carvalho, J Moreira… - ACM Transactions on …, 2023 - dl.acm.org
This article introduces YaConv, a new algorithm to compute convolution using GEMM
microkernels from a Basic Linear Algebra Subprograms library that is efficient for multiple …

Layup: Layer-adaptive and multi-type intermediate-oriented memory optimization for GPU-based CNNs

W Jiang, Y Ma, B Liu, H Liu, BB Zhou, J Zhu… - ACM Transactions on …, 2019 - dl.acm.org
Although GPUs have emerged as the mainstream for the acceleration of convolutional
neural network (CNN) training processes, they usually have limited physical memory …

Accelerating all-electron ab initio simulation of raman spectra for biological systems

H Shang, F Li, Y Zhang, Y Liu, L Zhang, M Wu… - Proceedings of the …, 2021 - dl.acm.org
Raman spectroscopy provides chemical and compositional information that can serve as a
structural fingerprint for various materials. Therefore, simulations of Raman spectra …

Bandwidth-aware loop tiling for dma-supported scratchpad memory

M Wu, Y Liu, H Cui, Q Wei, Q Li, L Li, F Lv… - Proceedings of the …, 2020 - dl.acm.org
Scratchpad Memory (SPM) is widely used in emerging domain-specific architectures and
accelerators for improving energy efficiency and time predictability. Typically, SPM-based …

Large-scale automatic k-means clustering for heterogeneous many-core supercomputer

T Yu, W Zhao, P Liu, V Janjic, X Yan… - … on Parallel and …, 2019 - ieeexplore.ieee.org
This article presents an automatic k-means clustering solution targeting the Sunway
TaihuLight supercomputer. We first introduce a multilevel parallel partition approach that not …

High performance reconfigurable computing for numerical simulation and deep learning

L Gan, M Yuan, J Yang, W Zhao, W Luk… - CCF Transactions on High …, 2020 - Springer
Due to their customizable on-chip resources, reconfigurable computing platforms such as
FPGAs are able to achieve better time-to-solution and energy-to-solution than general …

A dynamic agricultural prediction system for large-scale drought assessment on the Sunway TaihuLight supercomputer

X Huang, C Yu, J Fang, G Huang, S Ni, J Hall… - … and electronics in …, 2018 - Elsevier
Crop models are widely used to evaluate the response of crop growth to drought. However,
over large geographic regions, the most advanced models are often restricted by available …