Hardware compute partitioning on NVIDIA GPUs

J Bakita, JH Anderson - 2023 IEEE 29th Real-Time and …, 2023 - ieeexplore.ieee.org
Embedded and autonomous systems are increasingly integrating AI/ML features, often
enabled by a hardware accelerator such as a GPU. As these workloads become …

Neural architecture sizing for autonomous systems

S Xu, C Hobbs, Y Song, B Ghosh… - 2024 ACM/IEEE 15th …, 2024 - ieeexplore.ieee.org
Neural networks (NNs) are now widely used for perception processing in autonomous
systems. Data from sensors like cameras and lidars, after being processed by NNs, feed …

Cache bank-aware denial-of-service attacks on multicore ARM processors

M Bechtel, H Yun - 2023 IEEE 29th Real-Time and Embedded …, 2023 - ieeexplore.ieee.org
In this paper, we identify that bank contention in the shared last-level cache (LLC) of
multicore processors can cause significant execution time slowdown to cross-core victims …

Analysis and mitigation of shared resource contention on heterogeneous multicore: An industrial case study

M Bechtel, H Yun - IEEE Transactions on Computers, 2024 - ieeexplore.ieee.org
In this paper, we present a solution to the industrial challenge put forth by ARM in 2022. We
systematically analyze the effect of shared resource contention to an augmented reality …

Utilizing Machine Learning Techniques for Worst-Case Execution Time Estimation on GPU Architectures

V Kumar, B Ranjbar, A Kumar - IEEE Access, 2024 - ieeexplore.ieee.org
The massive parallelism provided by Graphics Processing Units (GPUs) to accelerate
compute-intensive tasks makes it preferable for Real-Time Systems such as autonomous …

Memory-Aware Latency Prediction Model for Concurrent Kernels in Partitionable GPUs: Simulations and Experiments

A Masola, N Capodieci, R Cavicchioli… - Workshop on Job …, 2023 - Springer
The current trend in recently released Graphic Processing Units (GPUs) is to exploit
transistor scaling at the architectural level, hence, larger and larger GPUs in every new chip …

Per-Bank Bandwidth Regulation of Shared Last-Level Cache for Real-Time Systems

C Sullivan, A Manley, M Alian… - 2024 IEEE Real-Time …, 2024 - ieeexplore.ieee.org
Modern commercial-off-the-shelf (COTS) multicore processors have advanced memory
hierarchies that enhance memory-level parallelism (MLP), which is crucial for high …

Towards Efficient Parallel GPU Scheduling: Interference Awareness with Schedule Abstraction

N Feddal, G Lipari, HE Zahaf - … of the 32nd International Conference on …, 2024 - dl.acm.org
GPUs are powerful computing architectures that are increasingly used in embedded
systems for implementing complex intelligent applications. Unfortunately, it is difficult to …

Missile: Fine-Grained, Hardware-Level GPU Resource Isolation for Multi-Tenant DNN Inference

Y Zhang, H Yu, C Han, C Wang, B Lu, Y Li… - arxiv preprint arxiv …, 2024 - arxiv.org
Colocating high-priority, latency-sensitive (LS) and low-priority, best-effort (BE) DNN
inference services reduces the total cost of ownership (TCO) of GPU clusters. Limited by …

Real-Time Scheduling for Computing Architectures

A Easwaran, M Yuhas, S Ramanathan… - Handbook of Computer …, 2024 - Springer
An operating system (OS) is a supervisory program in a computing system, responsible for
efficient management of the hardware resources. In the context of real-time systems, that is …