Adaptive cache coherence mechanisms with producer–consumer sharing optimization for chip multiprocessors

A Kayi, O Serres, T El-Ghazawi - IEEE Transactions on …, 2013 - ieeexplore.ieee.org
In chip multiprocessors (CMPs), maintaining cache coherence can account for a major
performance overhead. Write-invalidate protocols adapted by most CMPs generate high …

Scalable parallel AMG on ccNUMA machines with OpenMP

M Förster, J Kraus - Computer Science-Research and Development, 2011 - Springer
In many numerical simulation codes the backbone of the application covers the solution of
linear systems of equations. Often, being created via a discretization of differential …

Address translation optimization for Unified Parallel C multi-dimensional arrays

O Serres, A Anbar, SG Merchant, A Kayi… - … on Parallel and …, 2011 - ieeexplore.ieee.org
Partitioned Global Address Space (PGAS) languages offer significant programmability
advantages with its global memory view abstraction, one-sided communication constructs …

Impact of the memory hierarchy on shared memory architectures in multicore programming models

RM Badia, JM Perez, E Ayguadé… - 2009 17th Euromicro …, 2009 - ieeexplore.ieee.org
Many and multicore architectures put a big pressure in parallel programming but gives a
unique opportunity to propose new programming models that automatically exploit the …

Point-to-point communication on gigabit ethernet and InfiniBand networks

R Ismail, NA Wati Abdul Hamid, M Othman… - … and Information Science …, 2011 - Springer
This paper presents the measurements of the MPI point-to-point communication
performances on Razi and Haitham clusters by using SKaMPI, IMB and MPBench …

[PDF][PDF] Performance analysis of message passing interface collective communication on Intel Xeon quad-core Gigabit Ethernet and InfiniBand clusters

R Ismail, NAWA Hamid, M Othman… - Journal of Computer …, 2013 - researchgate.net
The performance of MPI implementation operations still presents critical issues for high
performance computing systems, particularly for more advanced processor technology …

MPI communication benchmarking on Intel Xeon dual quad-core processor cluster

R Ismail, NAWA Hamid, M Othman… - … IEEE Conference on …, 2011 - ieeexplore.ieee.org
This paper reports the measurements of MPI communication benchmarking on Khaldun
cluster which ran on Linux-based IBM Blade HS21 Servers with Intel Xeon dual quad-core …

STAND: New tool for performance estimation of the block data processing algorithms in high-load systems

V Minchenkov, V Bashun… - 2013 13th Conference of …, 2013 - ieeexplore.ieee.org
The main goal of this work is to present the developed research tool to find, investigate and
analyze hidden dependences between parameters of the hardware/software platforms (such …

An Efficient Cache Coherence Mechanism for Chip Multiprocessors

A Kayi - 2011 - search.proquest.com
Due to power and clocking constraints, integrating more processing cores onto a single chip,
instead of increasing the frequency has become the norm in modern processor design. This …

Analysis of Inter-Chip Communication Patterns on Multi-Core Distributed Shared-Memory Computers

M Mücke, W Gansterer - 2011 - eprints.cs.univie.ac.at
Multi-core multi-socket distributed shared-memory computers (DSM computers, for short)
have become an important node architecture in scientific computing as they provide …