SODA: Stencil with optimized dataflow architecture

Y Chi, J Cong, P Wei, P Zhou - 2018 IEEE/ACM International …, 2018 - ieeexplore.ieee.org
Stencil computation is one of the most important kernels in many application domains such
as image processing, solving partial differential equations, and cellular automata. Many of …

A heuristic clustering-based task deployment approach for load balancing using Bayes theorem in cloud environment

J Zhao, K Yang, X Wei, Y Ding, L Hu… - IEEE Transactions on …, 2015 - ieeexplore.ieee.org
Aiming at the current problems that most physical hosts in the cloud data center are so
overloaded that it makes the whole cloud data center'load imbalanced and that existing load …

{APUNet}: Revitalizing {GPU} as packet processing accelerator

Y Go, MA Jamshed, YG Moon, C Hwang… - 14th USENIX Symposium …, 2017 - usenix.org
Many research works have recently experimented with GPU to accelerate packet processing
in network applications. Most works have shown that GPU brings a significant performance …

STELLA: A domain-specific tool for structured grid methods in weather and climate models

T Gysi, C Osuna, O Fuhrer, M Bianco… - Proceedings of the …, 2015 - dl.acm.org
Many high-performance computing applications solving partial differential equations (PDEs)
can be attributed to the class of kernels using stencils on structured grids. Due to the …

{GPUvm}: Why Not Virtualizing {GPUs} at the Hypervisor?

Y Suzuki, S Kato, H Yamada, K Kono - 2014 USENIX Annual Technical …, 2014 - usenix.org
Graphics processing units (GPUs) provide orders-of-magnitude speedup for compute-
intensive data-parallel applications. However, enterprise and cloud computing domains …

Impediments to understanding seagrasses' response to global change

BM Rock, BH Daru - Frontiers in Marine Science, 2021 - frontiersin.org
Uncertainties from sampling biases present challenges to ecologists and evolutionary
biologists in understanding species sensitivity to anthropogenic climate change. Here, we …

Improving strong-scaling of CNN training by exploiting finer-grained parallelism

N Dryden, N Maruyama, T Benson… - 2019 IEEE …, 2019 - ieeexplore.ieee.org
Scaling CNN training is necessary to keep up with growing datasets and reduce training
time. We also see an emerging need to handle datasets with very large samples, where …

GPU-aware MPI on RDMA-enabled clusters: Design, implementation and evaluation

H Wang, S Potluri, D Bureddy… - … on Parallel and …, 2013 - ieeexplore.ieee.org
Designing high-performance and scalable applications on GPU clusters requires tackling
several challenges. The key challenge is the separate host memory and device memory …

Yask—yet another stencil kernel: A framework for hpc stencil code-generation and tuning

C Yount, J Tobin, A Breuer… - 2016 Sixth International …, 2016 - ieeexplore.ieee.org
Stencil computation is an important class of algorithms used in a large variety of scientific-
simulation applications. While the code for many problems can certainly be written in a …

Multicore-optimized wavefront diamond blocking for optimizing stencil updates

T Malas, G Hager, H Ltaief, H Stengel, G Wellein… - SIAM Journal on …, 2015 - SIAM
The importance of stencil-based algorithms in computational science has focused attention
on optimized parallel implementations for multilevel cache-based processors. Temporal …