Hierarchical dag scheduling for hybrid distributed systems

W Wu, A Bouteiller, G Bosilca… - 2015 IEEE …, 2015‏ - ieeexplore.ieee.org
Accelerator-enhanced computing platforms have drawn a lot of attention due to their
massive peak commutational capacity. Despite significant advances in the programming …

[HTML][HTML] Region templates: Data representation and management for high-throughput image analysis

G Teodoro, T Pan, T Kurc, J Kong, L Cooper, S Klasky… - Parallel Computing, 2014‏ - Elsevier
We introduce a region template abstraction and framework for the efficient storage,
management and processing of common data types in analysis of large datasets of high …

Task-based Cholesky decomposition on knights corner using OpenMP

J Dorris, J Kurzak, P Luszczek, A YarKhan… - … Conference on High …, 2016‏ - Springer
The growing popularity of the Intel Xeon Phi coprocessors and the continued development
of this new many-core architecture have created the need for an open-source, scalable, and …

Balancing task-and data-level parallelism to improve performance and energy consumption of matrix computations on the intel xeon phi

MF Dolz, FD Igual, T Ludwig, L Piñuel… - Computers & Electrical …, 2015‏ - Elsevier
The emergence of new manycore architectures, such as the Intel Xeon Phi, poses new
challenges in how to adapt existing libraries and applications to this type of systems. In …

Performance evaluation of Kvazaar HEVC intra encoder on Xeon Phi many-core processor

A Koivula, M Viitanen, A Lemmetti… - 2015 IEEE global …, 2015‏ - ieeexplore.ieee.org
This paper analyzes parallel scalability and coding speed of our open-source Kvazaar
HEVC intra encoder on Intel Xeon Phi 61-core coprocessor that supports up to four …

A general and efficient divide-and-conquer algorithm framework for multi-core clusters

CH González, BB Fraguela - Cluster Computing, 2017‏ - Springer
Divide-and-conquer is one of the most important patterns of parallelism, being applicable to
a large variety of problems. In addition, the most powerful parallel systems available …

Efficient execution of microscopy image analysis on CPU, GPU, and MIC equipped cluster systems

G Andrade, R Ferreira, G Teodoro… - 2014 IEEE 26th …, 2014‏ - ieeexplore.ieee.org
High performance computing is experiencing a major paradigm shift with the introduction of
accelerators, such as graphics processing units (GPUs) and Intel Xeon Phi (MIC). These …

Improving communication and load balancing with thread map** in manycore systems

EHM Cruz, M Diener, MS Serpa… - 2018 26th Euromicro …, 2018‏ - ieeexplore.ieee.org
Communication and load balancing have a significant impact on the performance of parallel
applications and have been the subject of extensive research in multicore architectures …

HPSM: a programming framework to exploit multi-CPU and multi-GPU systems simultaneously

JVF Lima, DD Domenico - International Journal of Grid and …, 2019‏ - inderscienceonline.com
This paper presents a high-level C++ framework to explore multi-CPU and multi-GPU
systems called HPSM. HPSM enables execution of parallel loops and reductions …

A server-side accelerator framework for multi-core CPUs and Intel Xeon Phi co-processor systems

G You, X Wang - Cluster Computing, 2020‏ - Springer
Processing-intensive web server requests can lead to low Quality of Service (QoS), such as
longer mean response time and lower throughput, which calls for a new web server software …