A survey of methods for analyzing and improving GPU energy efficiency
Recent years have witnessed phenomenal growth in the computational capabilities and
applications of GPUs. However, this trend has also led to a dramatic increase in their power …
applications of GPUs. However, this trend has also led to a dramatic increase in their power …
A taxonomy and survey of power models and power modeling for cloud servers
Due to the increasing demand of cloud resources, the ever-increasing number and scale of
cloud data centers make their massive power consumption a prominent issue today …
cloud data centers make their massive power consumption a prominent issue today …
GPUs and the future of parallel computing
This article discusses the capabilities of state-of-the art GPU-based high-throughput
computing systems and considers the challenges to scaling single-chip parallel-computing …
computing systems and considers the challenges to scaling single-chip parallel-computing …
Cache-conscious wavefront scheduling
This paper studies the effects of hardware thread scheduling on cache management in
GPUs. We propose Cache-Conscious Wave front Scheduling (CCWS), an adaptive …
GPUs. We propose Cache-Conscious Wave front Scheduling (CCWS), an adaptive …
OWL: cooperative thread array aware scheduling techniques for improving GPGPU performance
Emerging GPGPU architectures, along with programming models like CUDA and OpenCL,
offer a cost-effective platform for many applications by providing high thread level …
offer a cost-effective platform for many applications by providing high thread level …
Neither more nor less: Optimizing thread-level parallelism for GPGPUs
General-purpose graphics processing units (GPG-PUs) are at their best in accelerating
computation by exploiting abundant thread-level parallelism (TLP) offered by many classes …
computation by exploiting abundant thread-level parallelism (TLP) offered by many classes …
Improving GPGPU resource utilization through alternative thread block scheduling
High performance in GPGPU workloads is obtained by maximizing parallelism and fully
utilizing the available resources. The thousands of threads are assigned to each core in …
utilizing the available resources. The thousands of threads are assigned to each core in …
Orchestrated scheduling and prefetching for GPGPUs
In this paper, we present techniques that coordinate the thread scheduling and prefetching
decisions in a General Purpose Graphics Processing Unit (GPGPU) architecture to better …
decisions in a General Purpose Graphics Processing Unit (GPGPU) architecture to better …
MRPB: Memory request prioritization for massively parallel processors
Massively parallel, throughput-oriented systems such as graphics processing units (GPUs)
offer high performance for a broad range of programs. They are, however, complex to …
offer high performance for a broad range of programs. They are, however, complex to …
Divergence-aware warp scheduling
This paper uses hardware thread scheduling to improve the performance and energy
efficiency of divergent applications on GPUs. We propose Divergence-Aware Warp …
efficiency of divergent applications on GPUs. We propose Divergence-Aware Warp …