Energy-aware scheduling for high-performance computing systems: A survey
High-performance computing (HPC), according to its name, is traditionally oriented toward
performance, especially the execution time and scalability of the computations. However …
performance, especially the execution time and scalability of the computations. However …
Deeprecsys: A system for optimizing end-to-end at-scale neural recommendation inference
Neural personalized recommendation is the cornerstone of a wide collection of cloud
services and products, constituting significant compute demand of cloud infrastructure. Thus …
services and products, constituting significant compute demand of cloud infrastructure. Thus …
Summarizing CPU and GPU design trends with product data
Moore's Law and Dennard Scaling have guided the semiconductor industry for the past few
decades. Recently, both laws have faced validity challenges as transistor sizes approach …
decades. Recently, both laws have faced validity challenges as transistor sizes approach …
MGPUSim: Enabling multi-GPU performance modeling and optimization
The rapidly growing popularity and scale of data-parallel workloads demand a
corresponding increase in raw computational power of Graphics Processing Units (GPUs) …
corresponding increase in raw computational power of Graphics Processing Units (GPUs) …
Chai: Collaborative heterogeneous applications for integrated-architectures
Heterogeneous system architectures are evolving towards tighter integration among
devices, with emerging features such as shared virtual memory, memory coherence, and …
devices, with emerging features such as shared virtual memory, memory coherence, and …
Gnnmark: A benchmark suite to characterize graph neural network training on gpus
Graph Neural Networks (GNNs) have emerged as a promising class of Machine Learning
algorithms to train on non-euclidean data. GNNs are widely used in recommender systems …
algorithms to train on non-euclidean data. GNNs are widely used in recommender systems …
Grus: Toward unified-memory-efficient high-performance graph processing on gpu
Today's GPU graph processing frameworks face scalability and efficiency issues as the
graph size exceeds GPU-dedicated memory limit. Although recent GPUs can over-subscribe …
graph size exceeds GPU-dedicated memory limit. Although recent GPUs can over-subscribe …
Griffin: Hardware-software support for efficient page migration in multi-gpu systems
As transistor scaling becomes increasingly more difficult to achieve, scaling the core count
on a single GPU chip has also become extremely challenging. As the volume of data to …
on a single GPU chip has also become extremely challenging. As the volume of data to …
IDYLL: Enhancing Page Translation in Multi-GPUs via Light Weight PTE Invalidations
Multi-GPU systems have emerged as a desirable platform to deliver high computing
capabilities and large memory capacity to accommodate large dataset sizes. However …
capabilities and large memory capacity to accommodate large dataset sizes. However …
Analysis and modeling of collaborative execution strategies for heterogeneous CPU-FPGA architectures
Heterogeneous CPU-FPGA systems are evolving towards tighter integration between CPUs
and FPGAs for improved performance and energy efficiency. At the same time …
and FPGAs for improved performance and energy efficiency. At the same time …