- Academic Search

J Gómez-Luna, I El Hajj, LW Chang… - … Analysis of Systems …, 2017 - ieeexplore.ieee.org

Heterogeneous system architectures are evolving towards tighter integration among
devices, with emerging features such as shared virtual memory, memory coherence, and …

Spara Citera Citerat av 121 Relaterade artiklar Alla 13 versionerna

[Free GPT-4]

[PDF] acm.org

Fast segmented sort on gpus

K Hou, W Liu, H Wang, W Feng - Proceedings of the International …, 2017 - dl.acm.org

Segmented sort, as a generalization of classical sort, orders a batch of independent
segments in a whole array. Along with the wider adoption of manycore processors for HPC …

Spara Citera Citerat av 72 Relaterade artiklar Alla 5 versionerna

[Free GPT-4]

[PDF] wisc.edu

CPElide: Efficient Multi-Chiplet GPU Implicit Synchronization

P Dalmia, RS Kumar, MD Sinclair - 2024 57th IEEE/ACM …, 2024 - ieeexplore.ieee.org

Chiplets are transforming computer system designs, allowing system designers to combine
heterogeneous computing resources at unprecedented scales. Breaking larger, mono-lithic …

Spara Citera Citerat av 2 Relaterade artiklar Alla 7 versionerna

IRIS: A performance-portable framework for cross-platform heterogeneous computing

J Kim, S Lee, B Johnston… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org

From edge to exascale, computer architectures are becoming more heterogeneous and
complex. The systems typically have fat nodes, with multicore CPUs and multiple hardware …

Spara Citera Citerat av 2 Relaterade artiklar Alla 7 versionerna

[Free GPT-4]

[PDF] acm.org

Wireframe: Supporting data-dependent parallelism through dependency graph execution in gpus

AA Abdolrashidi, D Tripathy, ME Belviranli… - Proceedings of the 50th …, 2017 - dl.acm.org

GPUs lack fundamental support for data-dependent parallelism and synchronization. While
CUDA Dynamic Parallelism signals progress in this direction, many limitations and …

Spara Citera Citerat av 45 Relaterade artiklar Alla 17 versionerna

[Free GPT-4]

[PDF] arxiv.org

Computation vs. communication scaling for future transformers on future hardware

S Pati, S Aga, M Islam, N Jayasena… - arxiv preprint arxiv …, 2023 - arxiv.org

Scaling neural network models has delivered dramatic quality gains across ML problems.
However, this scaling has increased the reliance on efficient distributed training techniques …

Spara Citera Citerat av 6 Relaterade artiklar Alla 2 versionerna Se som HTML-version

[Free GPT-4]

[PDF] nsf.gov

Blockmaestro: Enabling programmer-transparent task-based execution in gpu systems

AA Abdolrashidi, HA Esfeden… - 2021 ACM/IEEE 48th …, 2021 - ieeexplore.ieee.org

As modern GPU workloads grow in size and complexity, there is an ever-increasing demand
for GPU computational power. Emerging workloads contain hundreds or thousands of GPU …

Spara Citera Citerat av 19 Relaterade artiklar Alla 8 versionerna

[Free GPT-4]

[PDF] acm.org

Versapipe: a versatile programming framework for pipelined computing on GPU

Z Zheng, C Oh, J Zhai, X Shen, Y Yi… - Proceedings of the 50th …, 2017 - dl.acm.org

Pipeline is an important programming pattern, while GPU, designed mostly for data-level
parallel executions, lacks an efficient mechanism to support pipeline programming and …

Spara Citera Citerat av 40 Relaterade artiklar Alla 9 versionerna

[Free GPT-4]

[PDF] github.io

A survey of architectural approaches for improving GPGPU performance, programmability and heterogeneity

M Khairy, AG Wassal, M Zahran - Journal of Parallel and Distributed …, 2019 - Elsevier

With the skyrocketing advances of process technology, the increased need to process huge
amount of data, and the pivotal need for power efficiency, the usage of Graphics Processing …

Spara Citera Citerat av 36 Relaterade artiklar Alla 4 versionerna

[Free GPT-4]

[PDF] github.io

Oversubscribed command queues in GPUs

S Puthoor, X Tang, J Gross, BM Beckmann - Proceedings of the 11th …, 2018 - dl.acm.org

As GPUs become larger and provide an increasing number of parallel execution units, a
single kernel is no longer sufficient to utilize all available resources. As a result, GPU …

Spara Citera Citerat av 32 Relaterade artiklar Alla 3 versionerna

Skapa alarm

Citera

Avancerad sökning

Har sparats i Mitt bibliotek

KLAP: Kernel launch aggregation and promotion for optimizing dynamic parallelism

Chai: Collaborative heterogeneous applications for integrated-architectures

Fast segmented sort on gpus

CPElide: Efficient Multi-Chiplet GPU Implicit Synchronization

IRIS: A performance-portable framework for cross-platform heterogeneous computing

Wireframe: Supporting data-dependent parallelism through dependency graph execution in gpus

Computation vs. communication scaling for future transformers on future hardware

Blockmaestro: Enabling programmer-transparent task-based execution in gpu systems

Versapipe: a versatile programming framework for pipelined computing on GPU

A survey of architectural approaches for improving GPGPU performance, programmability and heterogeneity

Oversubscribed command queues in GPUs