Parallel programming models for heterogeneous many-cores: a comprehensive survey

J Fang, C Huang, T Tang, Z Wang - CCF Transactions on High …, 2020 - Springer
Heterogeneous many-cores are now an integral part of modern computing systems ranging
from embedding systems to supercomputers. While heterogeneous many-core design offers …

Dynamic GPU energy optimization for machine learning training workloads

F Wang, W Zhang, S Lai, M Hao… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org
GPUs are widely used to accelerate the training of machine learning workloads. As modern
machine learning models become increasingly larger, they require a longer time to train …

LIBSHALOM: Optimizing small and irregular-shaped matrix multiplications on ARMv8 multi-cores

W Yang, J Fang, D Dong, X Su, Z Wang - Proceedings of the …, 2021 - dl.acm.org
General Matrix Multiplication (GEMM) is a key subroutine in highperformance computing.
While the mainstream linear algebra libraries can deliver high performance on large and …

Deep program structure modeling through multi-relational graph-based learning

G Ye, Z Tang, H Wang, D Fang, J Fang… - Proceedings of the …, 2020 - dl.acm.org
Deep learning is emerging as a promising technique for building predictive models to
support code-related tasks like performance optimization and code vulnerability detection …

Kernel-as-a-Service: A serverless programming model for heterogeneous hardware accelerators

T Pfandzelter, A Dhakal, E Frachtenberg… - Proceedings of the 24th …, 2023 - dl.acm.org
With the slowing of Moore's law and decline of Dennard scaling, computing systems
increasingly rely on specialized hardware accelerators in addition to general-purpose …

Optimizing sparse matrix multiplications for graph neural networks

S Qiu, L You, Z Wang - … Workshop on Languages and Compilers for …, 2021 - Springer
Graph neural networks (GNNs) are emerging as a powerful technique for modeling graph
structures. Due to the sparsity of real-world graph data, GNN performance is limited by …

Online power management for multi-cores: A reinforcement learning based approach

Y Wang, W Zhang, M Hao… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org
Power and energy is the first-class design constraint for multi-core processors and is a
limiting factor for future-generation supercomputers. While modern processor design …

ML-Based Dynamic Operator-Level Query Map** for Stream Processing Systems in Heterogeneous Computing Environments

S Oh, GE Moon, S Park - 2024 IEEE International Conference …, 2024 - ieeexplore.ieee.org
Map** queries to optimal computing devices at the operator-level presents a significant
challenge in stream processing systems (SPS) with heterogeneous computing resources …

Compiler-directed scratchpad memory data transfer optimization for multithreaded applications on a heterogeneous many-core architecture

X Tao, J Pang, J Xu, Y Zhu - The Journal of Supercomputing, 2021 - Springer
The heterogeneous many-core architecture plays an important role in the fields of high-
performance computing and scientific computing. It uses accelerator cores with on-chip …

JavaScript Performance Tuning as a Crowdsourced Service

J Ren, L Gao, Z Wang - IEEE Transactions on Mobile …, 2023 - ieeexplore.ieee.org
JavaScript (JS) is one of the most used programming languages for mobile applications. As
JS is increasingly used in computation-intensive and latency-sensitive components, JS …