Hardware compute partitioning on NVIDIA GPUs
Embedded and autonomous systems are increasingly integrating AI/ML features, often
enabled by a hardware accelerator such as a GPU. As these workloads become …
enabled by a hardware accelerator such as a GPU. As these workloads become …
Neural architecture sizing for autonomous systems
Neural networks (NNs) are now widely used for perception processing in autonomous
systems. Data from sensors like cameras and lidars, after being processed by NNs, feed …
systems. Data from sensors like cameras and lidars, after being processed by NNs, feed …
Cache bank-aware denial-of-service attacks on multicore ARM processors
In this paper, we identify that bank contention in the shared last-level cache (LLC) of
multicore processors can cause significant execution time slowdown to cross-core victims …
multicore processors can cause significant execution time slowdown to cross-core victims …
Analysis and mitigation of shared resource contention on heterogeneous multicore: An industrial case study
In this paper, we present a solution to the industrial challenge put forth by ARM in 2022. We
systematically analyze the effect of shared resource contention to an augmented reality …
systematically analyze the effect of shared resource contention to an augmented reality …
Utilizing Machine Learning Techniques for Worst-Case Execution Time Estimation on GPU Architectures
The massive parallelism provided by Graphics Processing Units (GPUs) to accelerate
compute-intensive tasks makes it preferable for Real-Time Systems such as autonomous …
compute-intensive tasks makes it preferable for Real-Time Systems such as autonomous …
Memory-Aware Latency Prediction Model for Concurrent Kernels in Partitionable GPUs: Simulations and Experiments
The current trend in recently released Graphic Processing Units (GPUs) is to exploit
transistor scaling at the architectural level, hence, larger and larger GPUs in every new chip …
transistor scaling at the architectural level, hence, larger and larger GPUs in every new chip …
Per-Bank Bandwidth Regulation of Shared Last-Level Cache for Real-Time Systems
Modern commercial-off-the-shelf (COTS) multicore processors have advanced memory
hierarchies that enhance memory-level parallelism (MLP), which is crucial for high …
hierarchies that enhance memory-level parallelism (MLP), which is crucial for high …
Towards Efficient Parallel GPU Scheduling: Interference Awareness with Schedule Abstraction
GPUs are powerful computing architectures that are increasingly used in embedded
systems for implementing complex intelligent applications. Unfortunately, it is difficult to …
systems for implementing complex intelligent applications. Unfortunately, it is difficult to …
Missile: Fine-Grained, Hardware-Level GPU Resource Isolation for Multi-Tenant DNN Inference
Colocating high-priority, latency-sensitive (LS) and low-priority, best-effort (BE) DNN
inference services reduces the total cost of ownership (TCO) of GPU clusters. Limited by …
inference services reduces the total cost of ownership (TCO) of GPU clusters. Limited by …
Real-Time Scheduling for Computing Architectures
An operating system (OS) is a supervisory program in a computing system, responsible for
efficient management of the hardware resources. In the context of real-time systems, that is …
efficient management of the hardware resources. In the context of real-time systems, that is …