Extreme-scale ab initio quantum raman spectra simulations on the leadership HPC system in China
H Shang, F Li, Y Zhang, L Zhang, Y Fu, Y Gao… - Proceedings of the …, 2021 - dl.acm.org
Raman spectroscopy provides chemical and compositional information that can serve as a
structural fingerprint for various materials. Therefore, simulations of Raman spectra …
structural fingerprint for various materials. Therefore, simulations of Raman spectra …
Parallelization and optimization of NSGA-II on sunway TaihuLight system
X Liu, J Sun, L Zheng, S Wang, Y Liu… - IEEE Transactions on …, 2020 - ieeexplore.ieee.org
Sunway TaihuLight system is the first supercomputer offering a peak performance over 100
PFlops, which can be utilized to parallelize Non-dominated Sorting Genetic Algorithm II …
PFlops, which can be utilized to parallelize Non-dominated Sorting Genetic Algorithm II …
Optimizing small channel 3D convolution on GPU with tensor core
In many scenarios, particularly scientific AI applications, algorithm engineers widely adopt
more complex convolution, eg 3D CNN, to improve the accuracy. Scientific AI applications …
more complex convolution, eg 3D CNN, to improve the accuracy. Scientific AI applications …
YaConv: Convolution with low cache footprint
This article introduces YaConv, a new algorithm to compute convolution using GEMM
microkernels from a Basic Linear Algebra Subprograms library that is efficient for multiple …
microkernels from a Basic Linear Algebra Subprograms library that is efficient for multiple …
Layup: Layer-adaptive and multi-type intermediate-oriented memory optimization for GPU-based CNNs
Although GPUs have emerged as the mainstream for the acceleration of convolutional
neural network (CNN) training processes, they usually have limited physical memory …
neural network (CNN) training processes, they usually have limited physical memory …
Accelerating all-electron ab initio simulation of raman spectra for biological systems
H Shang, F Li, Y Zhang, Y Liu, L Zhang, M Wu… - Proceedings of the …, 2021 - dl.acm.org
Raman spectroscopy provides chemical and compositional information that can serve as a
structural fingerprint for various materials. Therefore, simulations of Raman spectra …
structural fingerprint for various materials. Therefore, simulations of Raman spectra …
Bandwidth-aware loop tiling for dma-supported scratchpad memory
M Wu, Y Liu, H Cui, Q Wei, Q Li, L Li, F Lv… - Proceedings of the …, 2020 - dl.acm.org
Scratchpad Memory (SPM) is widely used in emerging domain-specific architectures and
accelerators for improving energy efficiency and time predictability. Typically, SPM-based …
accelerators for improving energy efficiency and time predictability. Typically, SPM-based …
Large-scale automatic k-means clustering for heterogeneous many-core supercomputer
This article presents an automatic k-means clustering solution targeting the Sunway
TaihuLight supercomputer. We first introduce a multilevel parallel partition approach that not …
TaihuLight supercomputer. We first introduce a multilevel parallel partition approach that not …
High performance reconfigurable computing for numerical simulation and deep learning
Due to their customizable on-chip resources, reconfigurable computing platforms such as
FPGAs are able to achieve better time-to-solution and energy-to-solution than general …
FPGAs are able to achieve better time-to-solution and energy-to-solution than general …
A dynamic agricultural prediction system for large-scale drought assessment on the Sunway TaihuLight supercomputer
Crop models are widely used to evaluate the response of crop growth to drought. However,
over large geographic regions, the most advanced models are often restricted by available …
over large geographic regions, the most advanced models are often restricted by available …