Characterizing power management opportunities for llms in the cloud

P Patel, E Choukse, C Zhang, Í Goiri, B Warrier… - Proceedings of the 29th …, 2024 - dl.acm.org
Recent innovation in large language models (LLMs), and their myriad use cases have
rapidly driven up the compute demand for datacenter GPUs. Several cloud providers and …

Designing cloud servers for lower carbon

J Wang, DS Berger, F Kazhamiaka… - 2024 ACM/IEEE 51st …, 2024 - ieeexplore.ieee.org
To mitigate climate change, we must reduce carbon emissions from hyperscale cloud
computing. We find that cloud compute servers cause the majority of emissions in a general …

Dynamollm: Designing llm inference clusters for performance and energy efficiency

J Stojkovic, C Zhang, Í Goiri, J Torrellas… - arxiv preprint arxiv …, 2024 - arxiv.org
The rapid evolution and widespread adoption of generative large language models (LLMs)
have made them a pivotal workload in various applications. Today, LLM inference clusters …

Hyrax:{Fail-in-Place} server operation in cloud platforms

J Lyu, M You, C Irvene, M Jung, T Narmore… - … USENIX Symposium on …, 2023 - usenix.org
Today's cloud platforms handle server hardware failures by shutting down the affected
server and only turning it back online once it has been repaired by a technician. At cloud …

Cost-efficient overclocking in immersion-cooled datacenters

M Jalili, I Manousakis, Í Goiri, PA Misra… - 2021 ACM/IEEE 48th …, 2021 - ieeexplore.ieee.org
Cloud providers typically use air-based solutions for cooling servers in datacenters.
However, increasing transistor counts and the end of Dennard scaling will result in chips …

Peeling back the carbon curtain: Carbon optimization challenges in cloud computing

J Wang, U Gupta, A Sriraman - Proceedings of the 2nd Workshop on …, 2023 - dl.acm.org
The increasing carbon emissions from cloud computing requires new methods to reduce its
environmental impact. We explore extending data center server lifetimes to reduce …

Flex: High-availability datacenters with zero reserved power

C Zhang, AG Kumbhare, I Manousakis… - 2021 ACM/IEEE 48th …, 2021 - ieeexplore.ieee.org
Cloud providers, like Amazon and Microsoft, must guarantee high availability for a large
fraction of their workloads. For this reason, they build datacenters with redundant …

SmartOClock: Workload-and risk-aware overclocking in the cloud

J Stojkovic, PA Misra, Í Goiri, S Whitlock… - 2024 ACM/IEEE 51st …, 2024 - ieeexplore.ieee.org
Operating server components beyond their voltage and power design limit (ie, overclocking)
enables improving performance and lowering cost for cloud workloads. However …

Towards improved power management in cloud gpus

P Patel, Z Gong, S Rizvi, E Choukse… - IEEE Computer …, 2023 - ieeexplore.ieee.org
As modern server GPUs are increasingly power intensive, better power management
mechanisms can significantly reduce the power consumption, capital costs, and carbon …

Redesigning data centers for renewable energy

A Agarwal, J Sun, S Noghabi, S Iyengar… - Proceedings of the 20th …, 2021 - dl.acm.org
Renewable energy is becoming an important power source for data centers, especially with
the zero-carbon waste pledges made by big cloud providers. However, one of the main …