Superneurons: Dynamic GPU memory management for training deep neural networks
Going deeper and wider in neural architectures improves their accuracy, while the limited
GPU DRAM places an undesired restriction on the network design domain. Deep Learning …
GPU DRAM places an undesired restriction on the network design domain. Deep Learning …
SLATE: Design of a modern distributed and accelerated linear algebra library
The SLATE (Software for Linear Algebra Targeting Exascale) library is being developed to
provide fundamental dense linear algebra capabilities for current and upcoming distributed …
provide fundamental dense linear algebra capabilities for current and upcoming distributed …
Achieving high performance on supercomputers with a sequential task-based programming model
The emergence of accelerators as standard computing resources on supercomputers and
the subsequent architectural complexity increase revived the need for high-level parallel …
the subsequent architectural complexity increase revived the need for high-level parallel …
Real-time big data stream processing using GPU with spark over hadoop ecosystem
In this technological era, every person, authorities, entrepreneurs, businesses, and many
things around us are connected to the internet, forming Internet of thing (IoT). This generates …
things around us are connected to the internet, forming Internet of thing (IoT). This generates …
Extreme-scale task-based cholesky factorization toward climate and weather prediction applications
Climate and weather can be predicted statistically via geospatial Maximum Likelihood
Estimates (MLE), as an alternative to running large ensembles of forward models. The MLE …
Estimates (MLE), as an alternative to running large ensembles of forward models. The MLE …
Parallel applications often rely on work stealing schedulers in combination with fine-grained
tasking to achieve high performance and scalability. However, reducing the total energy …
tasking to achieve high performance and scalability. However, reducing the total energy …
Automated construction of high performance distributed programs in LuNA system
D Akhmed-Zaki, D Lebedev, V Malyshkin… - … Conference, PaCT 2019 …, 2019 - Springer
The paper concerns the problem of efficient distributed execution of fragmented programs in
LuNA system, which is a automated parallel programs construction system. In LuNA an …
LuNA system, which is a automated parallel programs construction system. In LuNA an …
Red: A systematic real-time scheduling approach for robotic environmental dynamics
Intelligent robots are designed to effectively navigate dynamic and unpredictable
environments laden with moving mechanical elements and objects. Such environment …
environments laden with moving mechanical elements and objects. Such environment …