86 PFLOPS deep potential molecular dynamics simulation of 100 million atoms with ab initio accuracy

D Lu, H Wang, M Chen, L Lin, R Car, E Weinan… - Computer Physics …, 2021 - Elsevier
We present the GPU version of DeePMD-kit, which, upon training a deep neural network
model using ab initio data, can drive extremely large-scale molecular dynamics (MD) …

MCM-GPU: Multi-chip-module GPUs for continued performance scalability

A Arunkumar, E Bolotin, B Cho, U Milic… - ACM SIGARCH …, 2017 - dl.acm.org
Historically, improvements in GPU-based high performance computing have been tightly
coupled to transistor scaling. As Moore's law slows down, and the number of transistors per …

Scheduling techniques for GPU architectures with processing-in-memory capabilities

A Pattnaik, X Tang, A Jog, O Kayiran… - Proceedings of the …, 2016 - dl.acm.org
Processing data in or near memory (PIM), as opposed to in conventional computational units
in a processor, can greatly alleviate the performance and energy penalties of data transfers …

Exploiting hierarchical context on a large database of object categories

MJ Choi, JJ Lim, A Torralba… - 2010 IEEE computer …, 2010 - ieeexplore.ieee.org
There has been a growing interest in exploiting contextual information in addition to local
features to detect and localize multiple object categories in an image. Context models can …

Gdev:{First-Class}{GPU} Resource Management in the Operating System

S Kato, M McThrow, C Maltzahn, S Brandt - 2012 USENIX Annual …, 2012 - usenix.org
Graphics processing units (GPUs) have become a very powerful platform embracing a
concept of heterogeneous many-core computing. However, application domains of GPUs …

Adaptive heterogeneous scheduling for integrated GPUs

R Kaleem, R Barik, T Shpeisman, BT Lewis… - Proceedings of the 23rd …, 2014 - dl.acm.org
Many processors today integrate a CPU and GPU on the same die, which allows them to
share resources like physical memory and lowers the cost of CPU-GPU communication. As …

Chai: Collaborative heterogeneous applications for integrated-architectures

J Gómez-Luna, I El Hajj, LW Chang… - … Analysis of Systems …, 2017 - ieeexplore.ieee.org
Heterogeneous system architectures are evolving towards tighter integration among
devices, with emerging features such as shared virtual memory, memory coherence, and …

Effisha: A software framework for enabling effficient preemptive scheduling of gpu

G Chen, Y Zhao, X Shen, H Zhou - … on Principles and Practice of Parallel …, 2017 - dl.acm.org
Modern GPUs are broadly adopted in many multitasking environments, including data
centers and smartphones. However, the current support for the scheduling of multiple GPU …

RGEM: A responsive GPGPU execution model for runtime engines

S Kato, K Lakshmanan, A Kumar… - 2011 IEEE 32nd …, 2011 - ieeexplore.ieee.org
General-purpose computing on graphics processing units, also known as GPGPU, is a
burgeoning technique to enhance the computation of parallel programs. Applying this …

Enabling and exploiting flexible task assignment on GPU through SM-centric program transformations

B Wu, G Chen, D Li, X Shen, J Vetter - Proceedings of the 29th ACM on …, 2015 - dl.acm.org
A GPU's computing power lies in its abundant memory bandwidth and massive parallelism.
However, its hardware thread schedulers, despite being able to quickly distribute …