Predictive performance modeling for distributed batch processing using black box monitoring and machine learning
In many domains, the previous decade was characterized by increasing data volumes and
growing complexity of data analyses, creating new demands for batch processing on …
growing complexity of data analyses, creating new demands for batch processing on …
Machine learning the computational cost of quantum chemistry
Computational quantum mechanics based molecular and materials design campaigns
consume increasingly more high-performance computer resources, making improved job …
consume increasingly more high-performance computer resources, making improved job …
HPC I/O throughput bottleneck analysis with explainable local models
With the growing complexity of high-performance computing (HPC) systems, achieving high
performance can be difficult because of I/O bottlenecks. We analyze multiple years' worth of …
performance can be difficult because of I/O bottlenecks. We analyze multiple years' worth of …
[PDF][PDF] Machine learning predictions for underestimation of job runtime on HPC system
In modern high-performance computing (HPC) systems, users are usually requested to
estimate the job runtime for system scheduling when they submit a job. In general, an …
estimate the job runtime for system scheduling when they submit a job. In general, an …
AI4IO: A suite of AI-based tools for IO-aware scheduling
Traditional workload managers do not have the capacity to consider how IO contention can
increase job runtime and even cause entire resource allocations to be wasted. Whether from …
increase job runtime and even cause entire resource allocations to be wasted. Whether from …
HPC workload characterization using feature selection and clustering
Large high-performance computers (HPC) are expensive tools responsible for supporting
thousands of scientific applications. However, it is not easy to determine the best set of …
thousands of scientific applications. However, it is not easy to determine the best set of …
An SMDP approach for Reinforcement Learning in HPC cluster schedulers
Deep reinforcement learning applied to computing systems has shown potential for
improving system performance, as well as faster discovery of better allocation strategies. In …
improving system performance, as well as faster discovery of better allocation strategies. In …
[HTML][HTML] AMPRO-HPCC: A machine-learning tool for predicting resources on slurm HPC clusters
Determining resource allocations (memory and time) for submitted jobs in High Performance
Computing (HPC) systems is a challenging process even for computer scientists. HPC users …
Computing (HPC) systems is a challenging process even for computer scientists. HPC users …
Survey of memory management techniques for hpc and cloud computing
A Pupykina, G Agosta - IEEE Access, 2019 - ieeexplore.ieee.org
The emergence of new classes of HPC applications and usage models, such as real-time
HPC and cloud HPC, coupled with the increasingly heterogeneous nature of HPC …
HPC and cloud HPC, coupled with the increasingly heterogeneous nature of HPC …
Sizey: Memory-efficient execution of scientific workflow tasks
As the amount of available data continues to grow in fields as diverse as bioinformatics,
physics, and remote sensing, the importance of scientific workflows in the design and im …
physics, and remote sensing, the importance of scientific workflows in the design and im …