Acceleration of graph neural network-based prediction models in chemistry via co-design optimization on intelligence processing units

H Helal, J Firoz, JA Bilbrey, H Sprueill… - Journal of Chemical …, 2024 - ACS Publications
Atomic structure prediction and associated property calculations are the bedrock of chemical
physics. Since high-fidelity ab initio modeling techniques for computing the structure and …

Asynchronous Memory Access Unit: Exploiting Massive Parallelism for Far Memory Access

L Wang, X Zhang, S Wang, Z Jiang, T Lu… - ACM Transactions on …, 2024 - dl.acm.org
The growing memory demands of modern applications have driven the adoption of far
memory technologies in data centers to provide cost-effective, high-capacity memory …

In-memory graph databases for web-scale data

VG Castellana, A Morari, J Weaver, A Tumeo… - Computer, 2015 - ieeexplore.ieee.org
In-Memory Graph Databases for Web-Scale Data Page 1 24 COMPUTER PUBLISHED BY THE
IEEE COMPUTER SOCIETY 0018-9162/15/$31.00 © 2015 IEEE COVER FEATURE BIG DATA …

Itoyori: Reconciling global address space and global fork-join task parallelism

S Shiina, K Taura - Proceedings of the International Conference for High …, 2023 - dl.acm.org
This paper introduces Itoyori, a task-parallel runtime system designed to tackle the
challenge of scaling task parallelism (more specifically, nested fork-join parallelism) beyond …

Caching puts and gets in a PGAS language runtime

MP Ferguson, D Buettner - 2015 9th International Conference …, 2015 - ieeexplore.ieee.org
We investigated a software cache for PGAS PUT and GET operations. The cache is
implemented as a software write-back cache with dirty bits, local memory consistency …

Shad: The scalable high-performance algorithms and data-structures library

VG Castellana, M Minutoli - 2018 18th IEEE/ACM International …, 2018 - ieeexplore.ieee.org
The unprecedented amount of data that needs to be processed in emerging data analytics
applications poses novel challenges to industry and academia. Scalability and high …

Practical distributed programming in c++

M Drocco, VG Castellana, M Minutoli - Proceedings of the 29th …, 2020 - dl.acm.org
The need for coupling high performance with productivity is steering the recent evolution of
the C++ language where low-level aspects of parallel and distributed computing are now …

Graphine: Programming graph-parallel computation of large natural graphs for multicore clusters

J Yan, G Tan, Z Mo, N Sun - IEEE Transactions on Parallel and …, 2015 - ieeexplore.ieee.org
Graph-parallel computation has become a crucial component in emerging applications of
web search, data analytics and machine learning. In practice, most graphs derived from real …

Gravel: Fine-grain gpu-initiated network messages

MS Orr, S Che, BM Beckmann, M Oskin… - Proceedings of the …, 2017 - dl.acm.org
Distributed systems incorporate GPUs because they provide massive parallelism in an
energy-efficient manner. Unfortunately, existing programming models make it difficult to …

Extending openshmem with aggregation support for improved message rate performance

A Welch, O Hernandez, S Poole - European Conference on Parallel …, 2023 - Springer
OpenSHMEM is a highly efficient one-sided communication API that implements the PGAS
parallel programming model, and is known for its low latency communication operations that …