Survey on the run‐time systems of enterprise application integration platforms focusing on performance

DL Freire, RZ Frantz, F Roos‐Frantz… - Software: Practice and …, 2019 - Wiley Online Library
Companies are taking advantage of cloud computing to upgrade their business processes.
Cloud computing requires interaction with many kinds of applications, so it is necessary to …

[KNIHA][B] Understanding latency hiding on GPUs

V Volkov - 2016 - search.proquest.com
Modern commodity processors such as GPUs may execute up to about a thousand of
physical threads per chip to better utilize their numerous execution units and hide execution …

A quantitative roofline model for GPU kernel performance estimation using micro-benchmarks and hardware metric profiling

E Konstantinidis, Y Cotronis - Journal of Parallel and Distributed Computing, 2017 - Elsevier
Typically, the execution time of a kernel on a GPU is a difficult to predict measure as it
depends on a wide range of factors. Performance can be limited by either memory transfer …

Phases, Modalities, Spatial and Temporal Locality: Domain Specific ML Prefetcher for Accelerating Graph Analytics

P Zhang, R Kannan, VK Prasanna - Proceedings of the International …, 2023 - dl.acm.org
Memory performance is a key bottleneck in accelerating graph analytics. Existing Machine
Learning (ML) prefetchers encounter challenges with phase transitions and irregular …

A practical performance model for compute and memory bound GPU kernels

E Konstantinidis, Y Cotronis - 2015 23rd Euromicro …, 2015 - ieeexplore.ieee.org
Performance prediction of GPU kernels is generally a tedious procedure with unpredictable
results. In this paper, we provide a practical model for estimating performance of CUDA …

[PDF][PDF] Enhancing the performance of the aggregated bit vector algorithm in network packet classification using GPU

M Abbasi, R Tahouri, M Rafiee - PeerJ Computer Science, 2019 - peerj.com
Packet classification is a computationally intensive, highly parallelizable task in many
advanced network systems like high-speed routers and firewalls that enable different …

Rethinking memory management in modern operating system: Horizontal, vertical or random?

L Liu, Y Li, C Ding, H Yang, C Wu - IEEE Transactions on …, 2015 - ieeexplore.ieee.org
On modern multicore machines, the memory management typically combines address
interleaving in hardware and random allocation in the operating system (OS) to improve …

Memory performance and bottlenecks in multicore and gpu architectures

MS Serpa, FB Moreira, POA Navaux… - 2019 27th Euromicro …, 2019 - ieeexplore.ieee.org
Nowadays, there are several different architectures available not only for the industry, but
also for normal consumers. Traditional multicore processors, GPUs, accelerators such as the …

Alinea: An advanced linear algebra library for massively parallel computations on graphics processing units

F Magoules, AKC Ahamed - The International Journal of …, 2015 - journals.sagepub.com
Direct and iterative methods are often used to solve linear systems in engineering. The
matrices involved can be large, which leads to heavy computations on the central …

Analysis-driven engineering of comparison-based sorting algorithms on GPUs

B Karsin, V Weichert, H Casanova, J Iacono… - Proceedings of the …, 2018 - dl.acm.org
We study the relationship between memory accesses, bank conflicts, thread multiplicity (also
known as over-subscription) and instruction-level parallelism in comparison-based sorting …