- Academic Search

M Pellauer, J Clemons, V Balaji, N Crago… - ACM Transactions on …, 2023 - dl.acm.org

Sparse tensor algorithms are becoming widespread, particularly in the domains of deep
learning, graph and data analytics, and scientific computing. Current high-performance …

Salva Cita Citato da 7 Articoli correlati Tutte e 4 le versioni

[Free GPT-4]

[PDF] arxiv.org

Going further with winograd convolutions: Tap-wise quantization for efficient inference on 4x4 tiles

R Andri, B Bussolino, A Cipolletta… - 2022 55th IEEE/ACM …, 2022 - ieeexplore.ieee.org

Most of today's computer vision pipelines are built around deep neural networks, where
convolution operations require most of the generally high compute effort. The Winograd …

Salva Cita Citato da 17 Articoli correlati Tutte e 7 le versioni

[Free GPT-4]

[PDF] acm.org

Principal kernel analysis: A tractable methodology to simulate scaled GPU workloads

C Avalos Baddouh, M Khairy, RN Green… - MICRO-54: 54th Annual …, 2021 - dl.acm.org

Simulating all threads in a scaled GPU workload results in prohibitive simulation cost. Cycle-
level simulation is orders of magnitude slower than native silicon, the only solution is to …

Salva Cita Citato da 29 Articoli correlati Tutte e 11 le versioni

[Free GPT-4]

[PDF] github.io

Navisim: A highly accurate GPU simulator for AMD RDNA GPUs

Y Bao, Y Sun, Z Feric, MT Shen, M Weston… - Proceedings of the …, 2022 - dl.acm.org

As GPUs continue to grow in popularity for accelerating demanding applications, such as
high-performance computing and machine learning, GPU architects need to deliver more …

Salva Cita Citato da 14 Articoli correlati Tutte e 6 le versioni

[Free GPT-4]

[PDF] google.com

Gps: A global publish-subscribe model for multi-gpu memory management

H Muthukrishnan, D Lustig, D Nellans… - MICRO-54: 54th Annual …, 2021 - dl.acm.org

Suboptimal management of memory and bandwidth is one of the primary causes of low
performance on systems comprising multiple GPUs. Existing memory management solutions …

Salva Cita Citato da 19 Articoli correlati Tutte e 3 le versioni

[Free GPT-4]

[PDF] acm.org

Path Forward Beyond Simulators: Fast and Accurate GPU Execution Time Prediction for DNN Workloads

Y Li, Y Sun, A Jog - Proceedings of the 56th Annual IEEE/ACM …, 2023 - dl.acm.org

Today, DNNs' high computational complexity and sub-optimal device utilization present a
major roadblock to democratizing DNNs. To reduce the execution time and improve device …

Salva Cita Citato da 10 Articoli correlati Tutte e 4 le versioni

REC: Enhancing fine-grained cache coherence protocol in multi-GPU systems

G Ko, J Lee, H Kal, H Lee, WW Ro - Journal of Systems Architecture, 2025 - Elsevier

With the increasing demands of modern workloads, multi-GPU systems have emerged as a
scalable solution, extending performance beyond the capabilities of single GPUs. However …

Salva Cita Citato da 1 Articoli correlati

[Free GPT-4]

[PDF] acm.org

Photon: A fine-grained sampled simulation methodology for GPU workloads

C Liu, Y Sun, TE Carlson - Proceedings of the 56th Annual IEEE/ACM …, 2023 - dl.acm.org

GPUs, due to their massively-parallel computing architectures, provide high performance for
data-parallel applications. However, existing GPU simulators are too slow to enable …

Salva Cita Citato da 6 Articoli correlati Tutte e 6 le versioni

[Free GPT-4]

[PDF] harinimuthukrishnan.net

Finepack: Transparently improving the efficiency of fine-grained transfers in multi-gpu systems

H Muthukrishnan, D Lustig, O Villa… - … Symposium on High …, 2023 - ieeexplore.ieee.org

Recent studies have shown that using fine-grained peer-to-peer (P2P) stores to
communicate among devices in multi-GPU systems is a promising path to achieve strong …

Salva Cita Citato da 7 Articoli correlati Tutte e 3 le versioni

A Survey on Heterogeneous CPU–GPU Architectures and Simulators

M Alaei, F Yazdanpanah - Concurrency and Computation …, 2025 - Wiley Online Library

Heterogeneous architectures are vastly used in various high performance computing
systems from IoT‐based embedded architectures to edge and cloud systems. Although …

Salva Cita Articoli correlati

Crea avviso

Cita

Ricerca avanzata

Salvato in La mia biblioteca

Need for speed: Experiences building a trustworthy system-level gpu simulator

Symphony: Orchestrating sparse and dense tensors with hierarchical heterogeneous processing

Going further with winograd convolutions: Tap-wise quantization for efficient inference on 4x4 tiles

Principal kernel analysis: A tractable methodology to simulate scaled GPU workloads

Navisim: A highly accurate GPU simulator for AMD RDNA GPUs

Gps: A global publish-subscribe model for multi-gpu memory management

Path Forward Beyond Simulators: Fast and Accurate GPU Execution Time Prediction for DNN Workloads

REC: Enhancing fine-grained cache coherence protocol in multi-GPU systems

Photon: A fine-grained sampled simulation methodology for GPU workloads

Finepack: Transparently improving the efficiency of fine-grained transfers in multi-gpu systems

A Survey on Heterogeneous CPU–GPU Architectures and Simulators