A survey of methods for analyzing and improving GPU energy efficiency

S Mittal, JS Vetter - ACM Computing Surveys (CSUR), 2014 - dl.acm.org
Recent years have witnessed phenomenal growth in the computational capabilities and
applications of GPUs. However, this trend has also led to a dramatic increase in their power …

A taxonomy and survey of power models and power modeling for cloud servers

W Lin, F Shi, W Wu, K Li, G Wu… - ACM Computing Surveys …, 2020 - dl.acm.org
Due to the increasing demand of cloud resources, the ever-increasing number and scale of
cloud data centers make their massive power consumption a prominent issue today …

GPUs and the future of parallel computing

SW Keckler, WJ Dally, B Khailany, M Garland… - IEEE micro, 2011 - ieeexplore.ieee.org
This article discusses the capabilities of state-of-the art GPU-based high-throughput
computing systems and considers the challenges to scaling single-chip parallel-computing …

Cache-conscious wavefront scheduling

TG Rogers, M O'Connor… - 2012 45th Annual IEEE …, 2012 - ieeexplore.ieee.org
This paper studies the effects of hardware thread scheduling on cache management in
GPUs. We propose Cache-Conscious Wave front Scheduling (CCWS), an adaptive …

OWL: cooperative thread array aware scheduling techniques for improving GPGPU performance

A Jog, O Kayiran, N Chidambaram Nachiappan… - ACM SIGPLAN …, 2013 - dl.acm.org
Emerging GPGPU architectures, along with programming models like CUDA and OpenCL,
offer a cost-effective platform for many applications by providing high thread level …

Neither more nor less: Optimizing thread-level parallelism for GPGPUs

O Kayıran, A Jog, MT Kandemir… - Proceedings of the 22nd …, 2013 - ieeexplore.ieee.org
General-purpose graphics processing units (GPG-PUs) are at their best in accelerating
computation by exploiting abundant thread-level parallelism (TLP) offered by many classes …

Improving GPGPU resource utilization through alternative thread block scheduling

M Lee, S Song, J Moon, J Kim, W Seo… - 2014 IEEE 20th …, 2014 - ieeexplore.ieee.org
High performance in GPGPU workloads is obtained by maximizing parallelism and fully
utilizing the available resources. The thousands of threads are assigned to each core in …

Orchestrated scheduling and prefetching for GPGPUs

A Jog, O Kayiran, AK Mishra, MT Kandemir… - Proceedings of the 40th …, 2013 - dl.acm.org
In this paper, we present techniques that coordinate the thread scheduling and prefetching
decisions in a General Purpose Graphics Processing Unit (GPGPU) architecture to better …

MRPB: Memory request prioritization for massively parallel processors

W Jia, KA Shaw, M Martonosi - 2014 IEEE 20th international …, 2014 - ieeexplore.ieee.org
Massively parallel, throughput-oriented systems such as graphics processing units (GPUs)
offer high performance for a broad range of programs. They are, however, complex to …

Divergence-aware warp scheduling

TG Rogers, M O'Connor, TM Aamodt - … of the 46th Annual IEEE/ACM …, 2013 - dl.acm.org
This paper uses hardware thread scheduling to improve the performance and energy
efficiency of divergent applications on GPUs. We propose Divergence-Aware Warp …