- Academic Search

Survey on the run‐time systems of enterprise application integration platforms focusing on performance

DL Freire, RZ Frantz, F Roos‐Frantz… - Software: Practice and …, 2019 - Wiley Online Library

Companies are taking advantage of cloud computing to upgrade their business processes.
Cloud computing requires interaction with many kinds of applications, so it is necessary to …

Uložit Citovat Počet citací tohoto článku: 31 Související články Všechny verze (počet: 5)

[Free GPT-4]
[DeepSeek]

[PDF] escholarship.org

[KNIHA][B] Understanding latency hiding on GPUs

V Volkov - 2016 - search.proquest.com

Modern commodity processors such as GPUs may execute up to about a thousand of
physical threads per chip to better utilize their numerous execution units and hide execution …

Uložit Citovat Počet citací tohoto článku: 125 Související články Všechny verze (počet: 4) Hledat knihovnu

A quantitative roofline model for GPU kernel performance estimation using micro-benchmarks and hardware metric profiling

E Konstantinidis, Y Cotronis - Journal of Parallel and Distributed Computing, 2017 - Elsevier

Typically, the execution time of a kernel on a GPU is a difficult to predict measure as it
depends on a wide range of factors. Performance can be limited by either memory transfer …

Uložit Citovat Počet citací tohoto článku: 86 Související články Všechny verze (počet: 2)

[Free GPT-4]
[DeepSeek]

[PDF] acm.org

Phases, Modalities, Spatial and Temporal Locality: Domain Specific ML Prefetcher for Accelerating Graph Analytics

P Zhang, R Kannan, VK Prasanna - Proceedings of the International …, 2023 - dl.acm.org

Memory performance is a key bottleneck in accelerating graph analytics. Existing Machine
Learning (ML) prefetchers encounter challenges with phase transitions and irregular …

Uložit Citovat Počet citací tohoto článku: 2 Související články Všechny verze (počet: 3)

[Free GPT-4]
[DeepSeek]

[PDF] researchgate.net

A practical performance model for compute and memory bound GPU kernels

E Konstantinidis, Y Cotronis - 2015 23rd Euromicro …, 2015 - ieeexplore.ieee.org

Performance prediction of GPU kernels is generally a tedious procedure with unpredictable
results. In this paper, we provide a practical model for estimating performance of CUDA …

Uložit Citovat Počet citací tohoto článku: 46 Související články Všechny verze (počet: 4)

[Free GPT-4]
[DeepSeek]

[PDF] peerj.com

[PDF][PDF] Enhancing the performance of the aggregated bit vector algorithm in network packet classification using GPU

M Abbasi, R Tahouri, M Rafiee - PeerJ Computer Science, 2019 - peerj.com

Packet classification is a computationally intensive, highly parallelizable task in many
advanced network systems like high-speed routers and firewalls that enable different …

Uložit Citovat Počet citací tohoto článku: 25 Související články Všechny verze (počet: 9) Zobrazit jako HTML

[Free GPT-4]
[DeepSeek]

[PDF] researchgate.net

Rethinking memory management in modern operating system: Horizontal, vertical or random?

L Liu, Y Li, C Ding, H Yang, C Wu - IEEE Transactions on …, 2015 - ieeexplore.ieee.org

On modern multicore machines, the memory management typically combines address
interleaving in hardware and random allocation in the operating system (OS) to improve …

Uložit Citovat Počet citací tohoto článku: 39 Související články Všechny verze (počet: 5)

[Free GPT-4]
[DeepSeek]

[PDF] pucrs.br

Memory performance and bottlenecks in multicore and gpu architectures

MS Serpa, FB Moreira, POA Navaux… - 2019 27th Euromicro …, 2019 - ieeexplore.ieee.org

Nowadays, there are several different architectures available not only for the industry, but
also for normal consumers. Traditional multicore processors, GPUs, accelerators such as the …

Uložit Citovat Počet citací tohoto článku: 21 Související články Všechny verze (počet: 8)

Alinea: An advanced linear algebra library for massively parallel computations on graphics processing units

F Magoules, AKC Ahamed - The International Journal of …, 2015 - journals.sagepub.com

Direct and iterative methods are often used to solve linear systems in engineering. The
matrices involved can be large, which leads to heavy computations on the central …

Uložit Citovat Počet citací tohoto článku: 28 Související články Všechny verze (počet: 6)

[Free GPT-4]
[DeepSeek]

[PDF] acm.org

Analysis-driven engineering of comparison-based sorting algorithms on GPUs

B Karsin, V Weichert, H Casanova, J Iacono… - Proceedings of the …, 2018 - dl.acm.org

We study the relationship between memory accesses, bank conflicts, thread multiplicity (also
known as over-subscription) and instruction-level parallelism in comparison-based sorting …

Uložit Citovat Počet citací tohoto článku: 16 Související články Všechny verze (počet: 12)

Vytvořit upozornění

Citovat

Rozšířené vyhledávání

Uloženo do Mojí knihovny

A memory access model for highly-threaded many-core architectures

Survey on the run‐time systems of enterprise application integration platforms focusing on performance

[KNIHA][B] Understanding latency hiding on GPUs

A quantitative roofline model for GPU kernel performance estimation using micro-benchmarks and hardware metric profiling

Phases, Modalities, Spatial and Temporal Locality: Domain Specific ML Prefetcher for Accelerating Graph Analytics

A practical performance model for compute and memory bound GPU kernels

[PDF][PDF] Enhancing the performance of the aggregated bit vector algorithm in network packet classification using GPU

Rethinking memory management in modern operating system: Horizontal, vertical or random?

Memory performance and bottlenecks in multicore and gpu architectures

Alinea: An advanced linear algebra library for massively parallel computations on graphics processing units

Analysis-driven engineering of comparison-based sorting algorithms on GPUs