Survey of scheduling techniques for addressing shared resources in multicore processors
Chip multicore processors (CMPs) have emerged as the dominant architecture choice for
modern computing platforms and will most likely continue to be dominant well into the …
modern computing platforms and will most likely continue to be dominant well into the …
Fairness in serving large language models
High-demand LLM inference services (eg, ChatGPT and BARD) support a wide range of
requests from short chat conversations to long document reading. To ensure that all client …
requests from short chat conversations to long document reading. To ensure that all client …
Heracles: Improving resource efficiency at scale
User-facing, latency-sensitive services, such as websearch, underutilize their computing
resources during daily periods of low traffic. Reusing those resources for other tasks is rarely …
resources during daily periods of low traffic. Reusing those resources for other tasks is rarely …
Bubble-up: Increasing utilization in modern warehouse scale computers via sensible co-locations
As much of the world's computing continues to move into the cloud, the overprovisioning of
computing resources to ensure the performance isolation of latency-sensitive tasks, such as …
computing resources to ensure the performance isolation of latency-sensitive tasks, such as …
A case for exploiting subarray-level parallelism (SALP) in DRAM
Modern DRAMs have multiple banks to serve multiple memory requests in parallel.
However, when two requests go to the same bank, they have to be served serially …
However, when two requests go to the same bank, they have to be served serially …
Memguard: Memory bandwidth reservation system for efficient performance isolation in multi-core platforms
Memory bandwidth in modern multi-core platforms is highly variable for many reasons and is
a big challenge in designing real-time systems as applications are increasingly becoming …
a big challenge in designing real-time systems as applications are increasingly becoming …
Thread cluster memory scheduling: Exploiting differences in memory access behavior
In a modern chip-multiprocessor system, memory is a shared resource among multiple
concurrently executing threads. The memory scheduling algorithm should resolve memory …
concurrently executing threads. The memory scheduling algorithm should resolve memory …
Self-optimizing memory controllers: A reinforcement learning approach
Efficiently utilizing off-chip DRAM bandwidth is a critical issuein designing cost-effective,
high-performance chip multiprocessors (CMPs). Conventional memory controllers deliver …
high-performance chip multiprocessors (CMPs). Conventional memory controllers deliver …
Parallelism-aware batch scheduling: Enhancing both performance and fairness of shared DRAM systems
O Mutlu, T Moscibroda - ACM SIGARCH Computer Architecture News, 2008 - dl.acm.org
In a chip-multiprocessor (CMP) system, the DRAM system isshared among cores. In a
shared DRAM system, requests from athread can not only delay requests from other threads …
shared DRAM system, requests from athread can not only delay requests from other threads …
ATLAS: A scalable and high-performance scheduling algorithm for multiple memory controllers
Modern chip multiprocessor (CMP) systems employ multiple memory controllers to control
access to main memory. The scheduling algorithm employed by these memory controllers …
access to main memory. The scheduling algorithm employed by these memory controllers …