Survey of scheduling techniques for addressing shared resources in multicore processors

S Zhuravlev, JC Saez, S Blagodurov… - ACM Computing …, 2012 - dl.acm.org
Chip multicore processors (CMPs) have emerged as the dominant architecture choice for
modern computing platforms and will most likely continue to be dominant well into the …

Fairness in serving large language models

Y Sheng, S Cao, D Li, B Zhu, Z Li, D Zhuo… - … USENIX Symposium on …, 2024 - usenix.org
High-demand LLM inference services (eg, ChatGPT and BARD) support a wide range of
requests from short chat conversations to long document reading. To ensure that all client …

Heracles: Improving resource efficiency at scale

D Lo, L Cheng, R Govindaraju… - Proceedings of the …, 2015 - dl.acm.org
User-facing, latency-sensitive services, such as websearch, underutilize their computing
resources during daily periods of low traffic. Reusing those resources for other tasks is rarely …

Bubble-up: Increasing utilization in modern warehouse scale computers via sensible co-locations

J Mars, L Tang, R Hundt, K Skadron… - Proceedings of the 44th …, 2011 - dl.acm.org
As much of the world's computing continues to move into the cloud, the overprovisioning of
computing resources to ensure the performance isolation of latency-sensitive tasks, such as …

A case for exploiting subarray-level parallelism (SALP) in DRAM

Y Kim, V Seshadri, D Lee, J Liu, O Mutlu - ACM SIGARCH Computer …, 2012 - dl.acm.org
Modern DRAMs have multiple banks to serve multiple memory requests in parallel.
However, when two requests go to the same bank, they have to be served serially …

Memguard: Memory bandwidth reservation system for efficient performance isolation in multi-core platforms

H Yun, G Yao, R Pellizzoni… - 2013 IEEE 19th Real …, 2013 - ieeexplore.ieee.org
Memory bandwidth in modern multi-core platforms is highly variable for many reasons and is
a big challenge in designing real-time systems as applications are increasingly becoming …

Thread cluster memory scheduling: Exploiting differences in memory access behavior

Y Kim, M Papamichael, O Mutlu… - 2010 43rd Annual …, 2010 - ieeexplore.ieee.org
In a modern chip-multiprocessor system, memory is a shared resource among multiple
concurrently executing threads. The memory scheduling algorithm should resolve memory …

Self-optimizing memory controllers: A reinforcement learning approach

E Ipek, O Mutlu, JF Martínez, R Caruana - ACM SIGARCH Computer …, 2008 - dl.acm.org
Efficiently utilizing off-chip DRAM bandwidth is a critical issuein designing cost-effective,
high-performance chip multiprocessors (CMPs). Conventional memory controllers deliver …

Parallelism-aware batch scheduling: Enhancing both performance and fairness of shared DRAM systems

O Mutlu, T Moscibroda - ACM SIGARCH Computer Architecture News, 2008 - dl.acm.org
In a chip-multiprocessor (CMP) system, the DRAM system isshared among cores. In a
shared DRAM system, requests from athread can not only delay requests from other threads …

ATLAS: A scalable and high-performance scheduling algorithm for multiple memory controllers

Y Kim, D Han, O Mutlu… - HPCA-16 2010 The …, 2010 - ieeexplore.ieee.org
Modern chip multiprocessor (CMP) systems employ multiple memory controllers to control
access to main memory. The scheduling algorithm employed by these memory controllers …