A taxonomy of live migration management in cloud computing
Cloud Data Centers have become the key infrastructure for providing services. Instance
migration across different computing nodes in edge and cloud computing is essential to …
migration across different computing nodes in edge and cloud computing is essential to …
In-depth analyses of unified virtual memory system for GPU accelerated computing
The abstraction of a shared memory space over separate CPU and GPU memory domains
has eased the burden of portability for many HPC codebases. However, users pay for the …
has eased the burden of portability for many HPC codebases. However, users pay for the …
SAC: Sharing-aware caching in multi-chip GPUs
Bandwidth non-uniformity in multi-chip GPUs poses a major design challenge for its last-
level cache (LLC) architecture. Whereas a memory-side LLC caches data from the local …
level cache (LLC) architecture. Whereas a memory-side LLC caches data from the local …
IDYLL: Enhancing Page Translation in Multi-GPUs via Light Weight PTE Invalidations
Multi-GPU systems have emerged as a desirable platform to deliver high computing
capabilities and large memory capacity to accommodate large dataset sizes. However …
capabilities and large memory capacity to accommodate large dataset sizes. However …
Barre Chord: Efficient Virtual Memory Translation for Multi-Chip-Module GPUs
With the advancement of processor packaging technology and the looming end of Moore's
law, multi-chip-module (MCM) GPUs become a promising architecture to continue the …
law, multi-chip-module (MCM) GPUs become a promising architecture to continue the …
Locality-centric data and threadblock management for massive GPUs
Recent work has shown that building GPUs with hundreds of SMs in a single monolithic chip
will not be practical due to slowing growth in transistor density, low chip yields, and …
will not be practical due to slowing growth in transistor density, low chip yields, and …
Improving address translation in multi-gpus via sharing and spilling aware tlb design
In recent years, the ever-growing application complexity and input dataset sizes have driven
the popularity of multi-GPU systems as a desirable computing platform for many application …
the popularity of multi-GPU systems as a desirable computing platform for many application …
Gps: A global publish-subscribe model for multi-gpu memory management
Suboptimal management of memory and bandwidth is one of the primary causes of low
performance on systems comprising multiple GPUs. Existing memory management solutions …
performance on systems comprising multiple GPUs. Existing memory management solutions …
Demystifying gpu uvm cost with deep runtime and workload analysis
With GPUs becoming ubiquitous in HPC systems, NVIDIA's Unified Virtual Memory (UVM) is
being adopted as a measure to simplify porting of complex codes to GPU platforms by …
being adopted as a measure to simplify porting of complex codes to GPU platforms by …
Snakebyte: A tlb design with adaptive and recursive page merging in gpus
This paper presents an address translation scheme in GPUs named SnakeByte that can
dynamically manage variable-sized pages and maximize TLB reach by recursively merging …
dynamically manage variable-sized pages and maximize TLB reach by recursively merging …