Splitwise: Efficient generative llm inference using phase splitting
Generative large language model (LLM) applications are growing rapidly, leading to large-
scale deployments of expensive and power-hungry GPUs. Our characterization of LLM …
scale deployments of expensive and power-hungry GPUs. Our characterization of LLM …
Power-aware Deep Learning Model Serving with {μ-Serve}
With the increasing popularity of large deep learning model-serving workloads, there is a
pressing need to reduce the energy consumption of a model-serving cluster while …
pressing need to reduce the energy consumption of a model-serving cluster while …
Designing cloud servers for lower carbon
To mitigate climate change, we must reduce carbon emissions from hyperscale cloud
computing. We find that cloud compute servers cause the majority of emissions in a general …
computing. We find that cloud compute servers cause the majority of emissions in a general …
Reducing energy bloat in large model training
Training large AI models on numerous GPUs consumes a massive amount of energy,
making power delivery one of the largest limiting factors in building and operating …
making power delivery one of the largest limiting factors in building and operating …
Polca: Power oversubscription in llm cloud providers
Recent innovation in large language models (LLMs), and their myriad use-cases have
rapidly driven up the compute capacity demand for datacenter GPUs. Several cloud …
rapidly driven up the compute capacity demand for datacenter GPUs. Several cloud …
Characterizing Power Management Opportunities for LLMs in the Cloud
Recent innovation in large language models (LLMs), and their myriad use cases have
rapidly driven up the compute demand for datacenter GPUs. Several cloud providers and …
rapidly driven up the compute demand for datacenter GPUs. Several cloud providers and …
Towards Greener LLMs: Bringing Energy-Efficiency to the Forefront of LLM Inference
With the ubiquitous use of modern large language models (LLMs) across industries, the
inference serving for these models is ever expanding. Given the high compute and memory …
inference serving for these models is ever expanding. Given the high compute and memory …
An agile pathway towards carbon-aware clouds
Climate change is a pressing threat to planetary well-being that can be addressed only by
rapid near-term actions across all sectors. Yet, the cloud computing sector, with its …
rapid near-term actions across all sectors. Yet, the cloud computing sector, with its …
Guser: A GPGPU Power Stressmark Generator
Power stress mark is crucial for estimating Thermal Design Power (TDP) of GPGPUs to
ensure efficient power control. This paper proposes Guser, the first systematic methodology …
ensure efficient power control. This paper proposes Guser, the first systematic methodology …
Input-Dependent Power Usage in GPUs
GPUs are known to be power-hungry, and due to the boom in artificial intelligence, they are
currently the major contributors to the high power demands of upcoming datacenters. Most …
currently the major contributors to the high power demands of upcoming datacenters. Most …