Neurex: A case for neural rendering acceleration
This paper presents NeuRex, an accelerator architecture that efficiently performs the modern
neural rendering pipeline with an algorithmic enhancement and supporting hardware …
neural rendering pipeline with an algorithmic enhancement and supporting hardware …
{DVABatch}: Diversity-aware {Multi-Entry}{Multi-Exit} batching for efficient processing of {DNN} services on {GPUs}
The DNN inferences are often batched for better utilizing the hardware in existing DNN
serving systems. However, DNN serving exhibits diversity in many aspects, such as input …
serving systems. However, DNN serving exhibits diversity in many aspects, such as input …
Toward accelerated stencil computation by adapting tensor core unit on gpu
The Tensor Core Unit (TCU) has been increasingly adopted on modern high performance
processors, specialized in boosting the performance of general matrix multiplication …
processors, specialized in boosting the performance of general matrix multiplication …
Improving gpu throughput through parallel execution using tensor cores and cuda cores
To accelerate the execution of Machine Learning applications, recent GPUs use Tensor
cores to speed up the general matrix multiplication (GEMM), which is the heart of deep …
cores to speed up the general matrix multiplication (GEMM), which is the heart of deep …
QoS-aware irregular collaborative inference for improving throughput of DNN services
With collaborative DNN inference, part of queries run on their source edge device to reduce
latencies. Because edges show diverse performance and network conditions, different …
latencies. Because edges show diverse performance and network conditions, different …
Rosgm: A real-time gpu management framework with plug-in policies for ros 2
R Li, T Hu, X Jiang, L Li, W **ng… - 2023 IEEE 29th Real …, 2023 - ieeexplore.ieee.org
Robot Operating System (ROS) is a prevailing software framework for robotic appliscation
development. Graphics Processing Unit (GPU) is widely used in many ROS applications as …
development. Graphics Processing Unit (GPU) is widely used in many ROS applications as …
[HTML][HTML] Benchmarking GPU Tensor Cores on General Matrix Multiplication Kernels through CUTLASS
X Huang, X Zhang, P Yang, N **ao - Applied Sciences, 2023 - mdpi.com
GPUs have been broadly used to accelerate big data analytics, scientific computing and
machine intelligence. Particularly, matrix multiplication and convolution are two principal …
machine intelligence. Particularly, matrix multiplication and convolution are two principal …
MSHGN: Multi-scenario adaptive hierarchical spatial graph convolution network for GPU utilization prediction in heterogeneous GPU clusters
S Wang, S Chen, F Meng, Y Shi - Journal of Parallel and Distributed …, 2024 - Elsevier
Accurately predicting GPU utilization is crucial for effectively managing heterogeneous GPU
clusters, yet existing prediction methods are tailored to homogeneous clusters or ignore the …
clusters, yet existing prediction methods are tailored to homogeneous clusters or ignore the …
SRender: Boosting Neural Radiance Field Efficiency via Sensitivity-Aware Dynamic Precision Rendering
Neural Radiance Field (NeRF) holds immense promise for generating photo-realistic
images and videos. How-ever, the computation and memory demands significantly impede …
images and videos. How-ever, the computation and memory demands significantly impede …
Maximizing the Utilization of GPUs Used by Cloud Gaming through Adaptive Co-location with Combo
Cloud vendors are now providing cloud gaming services with GPUs. GPUs in cloud gaming
experience periods of idle because not every frame in a game always keeps the GPU busy …
experience periods of idle because not every frame in a game always keeps the GPU busy …