SODA: Stencil with optimized dataflow architecture
Stencil computation is one of the most important kernels in many application domains such
as image processing, solving partial differential equations, and cellular automata. Many of …
as image processing, solving partial differential equations, and cellular automata. Many of …
A heuristic clustering-based task deployment approach for load balancing using Bayes theorem in cloud environment
Aiming at the current problems that most physical hosts in the cloud data center are so
overloaded that it makes the whole cloud data center'load imbalanced and that existing load …
overloaded that it makes the whole cloud data center'load imbalanced and that existing load …
{APUNet}: Revitalizing {GPU} as packet processing accelerator
Many research works have recently experimented with GPU to accelerate packet processing
in network applications. Most works have shown that GPU brings a significant performance …
in network applications. Most works have shown that GPU brings a significant performance …
STELLA: A domain-specific tool for structured grid methods in weather and climate models
Many high-performance computing applications solving partial differential equations (PDEs)
can be attributed to the class of kernels using stencils on structured grids. Due to the …
can be attributed to the class of kernels using stencils on structured grids. Due to the …
{GPUvm}: Why Not Virtualizing {GPUs} at the Hypervisor?
Graphics processing units (GPUs) provide orders-of-magnitude speedup for compute-
intensive data-parallel applications. However, enterprise and cloud computing domains …
intensive data-parallel applications. However, enterprise and cloud computing domains …
Impediments to understanding seagrasses' response to global change
Uncertainties from sampling biases present challenges to ecologists and evolutionary
biologists in understanding species sensitivity to anthropogenic climate change. Here, we …
biologists in understanding species sensitivity to anthropogenic climate change. Here, we …
Improving strong-scaling of CNN training by exploiting finer-grained parallelism
N Dryden, N Maruyama, T Benson… - 2019 IEEE …, 2019 - ieeexplore.ieee.org
Scaling CNN training is necessary to keep up with growing datasets and reduce training
time. We also see an emerging need to handle datasets with very large samples, where …
time. We also see an emerging need to handle datasets with very large samples, where …
GPU-aware MPI on RDMA-enabled clusters: Design, implementation and evaluation
Designing high-performance and scalable applications on GPU clusters requires tackling
several challenges. The key challenge is the separate host memory and device memory …
several challenges. The key challenge is the separate host memory and device memory …
Yask—yet another stencil kernel: A framework for hpc stencil code-generation and tuning
Stencil computation is an important class of algorithms used in a large variety of scientific-
simulation applications. While the code for many problems can certainly be written in a …
simulation applications. While the code for many problems can certainly be written in a …
Multicore-optimized wavefront diamond blocking for optimizing stencil updates
The importance of stencil-based algorithms in computational science has focused attention
on optimized parallel implementations for multilevel cache-based processors. Temporal …
on optimized parallel implementations for multilevel cache-based processors. Temporal …