- Academic Search

Prophet: Precise qos prediction on non-preemptive accelerators to improve utilization in warehouse-scale computers

Q Chen, H Yang, M Guo, RS Kannan, J Mars… - Proceedings of the …, 2017 - dl.acm.org

Guaranteeing Quality-of-Service (QoS) of latency-sensitive applications while improving
server utilization through application co-location is important yet challenging in modern …

บันทึก อ้างอิง อ้างโดย181 บทความที่เกี่ยวข้อง ทั้งหมด 5 ฉบับ

[Free GPT-4]
[DeepSeek]

[PDF] cmu.edu

The locality descriptor: A holistic cross-layer abstraction to express data locality in GPUs

N Vijaykumar, E Ebrahimi, K Hsieh… - 2018 ACM/IEEE 45th …, 2018 - ieeexplore.ieee.org

Exploiting data locality in GPUs is critical to making more efficient use of the existing caches
and the NUMA-based memory hierarchy expected in future GPUs. While modern GPU …

บันทึก อ้างอิง อ้างโดย79 บทความที่เกี่ยวข้อง ทั้งหมด 9 ฉบับ

[Free GPT-4]
[DeepSeek]

[PDF] acm.org Full View

Coda: Enabling co-location of computation and data for multiple gpu systems

H Kim, R Hadidi, L Nai, H Kim, N Jayasena… - ACM Transactions on …, 2018 - dl.acm.org

To exploit parallelism and scalability of multiple GPUs in a system, it is critical to place
compute and data together. However, two key techniques that have been used to hide …

บันทึก อ้างอิง อ้างโดย35 บทความที่เกี่ยวข้อง ทั้งหมด 3 ฉบับ

CUDASTF: Bridging the Gap Between CUDA and Task Parallelism

C Augonnet, A Alexandrescu… - … Conference for High …, 2024 - ieeexplore.ieee.org

Organizing computation as asynchronous tasks with data-driven dependencies is a simple
and efficient model for single-and multi-GPU programs. Sequential Task Flow (STF) is such …

บันทึก อ้างอิง บทความที่เกี่ยวข้อง ทั้งหมด 3 ฉบับ

[Free GPT-4]
[DeepSeek]

[PDF] kent.ac.uk

Converting data-parallelism to task-parallelism by rewrites: purely functional programs across multiple GPUs

BJ Svensson, M Vollmer, E Holk, TL McDonell… - Proceedings of the 4th …, 2015 - dl.acm.org

High-level domain-specific languages for array processing on the GPU are increasingly
common, but they typically only run on a single GPU. As computational power is distributed …

บันทึก อ้างอิง อ้างโดย11 บทความที่เกี่ยวข้อง ทั้งหมด 11 ฉบับ

[Free GPT-4]
[DeepSeek]

[PDF] nsf.gov

Homp: Automated distribution of parallel loops and data in highly parallel accelerator-based systems

Y Yan, J Liu, KW Cameron… - 2017 IEEE International …, 2017 - ieeexplore.ieee.org

Heterogeneous computing systems, eg, those with accelerators than the host CPUs, offer
the accelerated performance for a variety of workloads. However, most parallel …

บันทึก อ้างอิง อ้างโดย9 บทความที่เกี่ยวข้อง ทั้งหมด 4 ฉบับ

Dynamic Task Scheduling Scheme for a GPGPU Programming Framework

K Ohno, R Yamamoto - 2015 Third International Symposium on …, 2015 - ieeexplore.ieee.org

The computational power and the physical memory size of a single GPU device are often
insufficient for large-scale problems. Using CUDA, the user must explicitly partition such …

บันทึก อ้างอิง อ้างโดย4 บทความที่เกี่ยวข้อง ทั้งหมด 4 ฉบับ

[Free GPT-4]
[DeepSeek]

[PDF] jst.go.jp

Dynamic task scheduling scheme for a GPGPU programming framework

K Ohno, R Yamamoto, H Tanaka - International Journal of …, 2016 - jstage.jst.go.jp

The computational power and the physical memory size of a single GPU device are often
insufficient for large-scale problems. Using CUDA, the user must explicitly partition such …

บันทึก อ้างอิง อ้างโดย2 บทความที่เกี่ยวข้อง ทั้งหมด 7 ฉบับ

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Enhancing Programmability, Portability, and Performance with Rich Cross-layer Abstractions

N Vijaykumar - 2019 - search.proquest.com

Programmability, performance portability, and resource efficiency have emerged as critical
challenges in harnessing complex and diverse architectures today to obtain high …

บันทึก อ้างอิง อ้างโดย1 บทความที่เกี่ยวข้อง ทั้งหมด 6 ฉบับ

[Free GPT-4]
[DeepSeek]

[PDF] acm.org

3-D Viewer for interpretation of multiple scan sections

B Baxter - Proceedings of the May 19-22, 1980, national …, 1980 - dl.acm.org

A new viewing device is being constructed which will allow a physician to examine multiple
scan sections simultaneously in their proper orientation in all three dimensions. Test images …

บันทึก อ้างอิง อ้างโดย1 บทความที่เกี่ยวข้อง ทั้งหมด 2 ฉบับ

สร้างการแจ้งเตือน

อ้างอิง

การค้นหาขั้นสูง

บันทึกไปยังคลังของฉันแล้ว

Automatic execution of single-GPU computations across multiple GPUs

Prophet: Precise qos prediction on non-preemptive accelerators to improve utilization in warehouse-scale computers

The locality descriptor: A holistic cross-layer abstraction to express data locality in GPUs

Coda: Enabling co-location of computation and data for multiple gpu systems

CUDASTF: Bridging the Gap Between CUDA and Task Parallelism

Converting data-parallelism to task-parallelism by rewrites: purely functional programs across multiple GPUs

Homp: Automated distribution of parallel loops and data in highly parallel accelerator-based systems

Dynamic Task Scheduling Scheme for a GPGPU Programming Framework

Dynamic task scheduling scheme for a GPGPU programming framework

Enhancing Programmability, Portability, and Performance with Rich Cross-layer Abstractions

3-D Viewer for interpretation of multiple scan sections