Superneurons: Dynamic GPU memory management for training deep neural networks

L Wang, J Ye, Y Zhao, W Wu, A Li, SL Song… - Proceedings of the 23rd …, 2018 - dl.acm.org
Going deeper and wider in neural architectures improves their accuracy, while the limited
GPU DRAM places an undesired restriction on the network design domain. Deep Learning …

SLATE: Design of a modern distributed and accelerated linear algebra library

M Gates, J Kurzak, A Charara, A YarKhan… - Proceedings of the …, 2019 - dl.acm.org
The SLATE (Software for Linear Algebra Targeting Exascale) library is being developed to
provide fundamental dense linear algebra capabilities for current and upcoming distributed …

Achieving high performance on supercomputers with a sequential task-based programming model

E Agullo, O Aumage, M Faverge… - … on Parallel and …, 2017 - ieeexplore.ieee.org
The emergence of accelerators as standard computing resources on supercomputers and
the subsequent architectural complexity increase revived the need for high-level parallel …

Real-time big data stream processing using GPU with spark over hadoop ecosystem

MM Rathore, H Son, A Ahmad, A Paul… - International Journal of …, 2018 - Springer
In this technological era, every person, authorities, entrepreneurs, businesses, and many
things around us are connected to the internet, forming Internet of thing (IoT). This generates …

Extreme-scale task-based cholesky factorization toward climate and weather prediction applications

Q Cao, Y Pei, K Akbudak, A Mikhalev… - Proceedings of the …, 2020 - dl.acm.org
Climate and weather can be predicted statistically via geospatial Maximum Likelihood
Estimates (MLE), as an alternative to running large ensembles of forward models. The MLE …
J Chen, M Manivannan, M Abduljabbar… - ACM Transactions on …, 2022 - dl.acm.org
Parallel applications often rely on work stealing schedulers in combination with fine-grained
tasking to achieve high performance and scalability. However, reducing the total energy …

Automated construction of high performance distributed programs in LuNA system

D Akhmed-Zaki, D Lebedev, V Malyshkin… - … Conference, PaCT 2019 …, 2019 - Springer
The paper concerns the problem of efficient distributed execution of fragmented programs in
LuNA system, which is a automated parallel programs construction system. In LuNA an …

Red: A systematic real-time scheduling approach for robotic environmental dynamics

Z Li, T Ren, X He, C Liu - 2023 IEEE Real-Time Systems …, 2023 - ieeexplore.ieee.org
Intelligent robots are designed to effectively navigate dynamic and unpredictable
environments laden with moving mechanical elements and objects. Such environment …