Neurex: A case for neural rendering acceleration

J Lee, K Choi, J Lee, S Lee, J Whangbo… - Proceedings of the 50th …, 2023 - dl.acm.org
This paper presents NeuRex, an accelerator architecture that efficiently performs the modern
neural rendering pipeline with an algorithmic enhancement and supporting hardware …

{DVABatch}: Diversity-aware {Multi-Entry}{Multi-Exit} batching for efficient processing of {DNN} services on {GPUs}

W Cui, H Zhao, Q Chen, H Wei, Z Li, D Zeng… - 2022 USENIX Annual …, 2022 - usenix.org
The DNN inferences are often batched for better utilizing the hardware in existing DNN
serving systems. However, DNN serving exhibits diversity in many aspects, such as input …

Toward accelerated stencil computation by adapting tensor core unit on gpu

X Liu, Y Liu, H Yang, J Liao, M Li, Z Luan… - Proceedings of the 36th …, 2022 - dl.acm.org
The Tensor Core Unit (TCU) has been increasingly adopted on modern high performance
processors, specialized in boosting the performance of general matrix multiplication …

Improving gpu throughput through parallel execution using tensor cores and cuda cores

K Ho, H Zhao, A Jog, S Mohanty - 2022 IEEE Computer Society …, 2022 - ieeexplore.ieee.org
To accelerate the execution of Machine Learning applications, recent GPUs use Tensor
cores to speed up the general matrix multiplication (GEMM), which is the heart of deep …

QoS-aware irregular collaborative inference for improving throughput of DNN services

K Fu, J Shi, Q Chen, N Zheng, W Zhang… - … Conference for High …, 2022 - ieeexplore.ieee.org
With collaborative DNN inference, part of queries run on their source edge device to reduce
latencies. Because edges show diverse performance and network conditions, different …

Rosgm: A real-time gpu management framework with plug-in policies for ros 2

R Li, T Hu, X Jiang, L Li, W **ng… - 2023 IEEE 29th Real …, 2023 - ieeexplore.ieee.org
Robot Operating System (ROS) is a prevailing software framework for robotic appliscation
development. Graphics Processing Unit (GPU) is widely used in many ROS applications as …

[HTML][HTML] Benchmarking GPU Tensor Cores on General Matrix Multiplication Kernels through CUTLASS

X Huang, X Zhang, P Yang, N **ao - Applied Sciences, 2023 - mdpi.com
GPUs have been broadly used to accelerate big data analytics, scientific computing and
machine intelligence. Particularly, matrix multiplication and convolution are two principal …

MSHGN: Multi-scenario adaptive hierarchical spatial graph convolution network for GPU utilization prediction in heterogeneous GPU clusters

S Wang, S Chen, F Meng, Y Shi - Journal of Parallel and Distributed …, 2024 - Elsevier
Accurately predicting GPU utilization is crucial for effectively managing heterogeneous GPU
clusters, yet existing prediction methods are tailored to homogeneous clusters or ignore the …

SRender: Boosting Neural Radiance Field Efficiency via Sensitivity-Aware Dynamic Precision Rendering

Z Song, H He, F Liu, Y Hao, X Song… - 2024 57th IEEE/ACM …, 2024 - ieeexplore.ieee.org
Neural Radiance Field (NeRF) holds immense promise for generating photo-realistic
images and videos. How-ever, the computation and memory demands significantly impede …

Maximizing the Utilization of GPUs Used by Cloud Gaming through Adaptive Co-location with Combo

B Chen, H Zhao, W Cui, Y He, S Zhang… - Proceedings of the …, 2023 - dl.acm.org
Cloud vendors are now providing cloud gaming services with GPUs. GPUs in cloud gaming
experience periods of idle because not every frame in a game always keeps the GPU busy …