StarPU: A Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures

C Augonnet, S Thibault, R Namyst… - Euro-Par 2009 Parallel …, 2009 - Springer
In the field of HPC, the current hardware trend is to design multiprocessor architectures that
feature heterogeneous technologies such as specialized coprocessors (eg Cell/BE SPUs) or …

Ompss: a proposal for programming heterogeneous multi-core architectures

A Duran, E Ayguadé, RM Badia, J Labarta… - Parallel processing …, 2011 - World Scientific
In this paper, we present OmpSs, a programming model based on OpenMP and StarSs, that
can also incorporate the use of OpenCL or CUDA kernels. We evaluate the proposal on …

A dependency-aware task-based programming environment for multi-core architectures

JM Perez, RM Badia, J Labarta - 2008 IEEE international …, 2008 - ieeexplore.ieee.org
Parallel programming on SMP and multi-core architectures is hard. In this paper we present
a programming model for those environments based on automatic function level parallelism …

Productive programming of GPU clusters with OmpSs

J Bueno, J Planas, A Duran, RM Badia… - 2012 IEEE 26th …, 2012 - ieeexplore.ieee.org
Clusters of GPUs are emerging as a new computational scenario. Programming them
requires the use of hybrid models that increase the complexity of the applications, reducing …

Criticality-aware dynamic task scheduling for heterogeneous architectures

K Chronaki, A Rico, RM Badia, E Ayguadé… - Proceedings of the 29th …, 2015 - dl.acm.org
Current and future parallel programming models need to be portable and efficient when
moving to heterogeneous multi-core systems. OmpSs is a task-based programming model …

Task scheduling techniques for asymmetric multi-core systems

K Chronaki, A Rico, M Casas, M Moretó… - … on Parallel and …, 2016 - ieeexplore.ieee.org
As performance and energy efficiency have become the main challenges for next-
generation high-performance computing, asymmetric multi-core architectures can provide …

Productive cluster programming with OmpSs

J Bueno, L Martinell, A Duran, M Farreras… - Euro-Par 2011 Parallel …, 2011 - Springer
Clusters of SMPs are ubiquitous. They have been traditionally programmed by using MPI.
But, the productivity of MPI programmers is low because of the complexity of expressing …

Using a" codelet" program execution model for exascale machines: position paper

S Zuckerman, J Suetterlein, R Knauerhase… - Proceedings of the 1st …, 2011 - dl.acm.org
As computing has moved relentlessly through giga-, tera-, and peta-scale systems, exa-
scale (a million trillion operations/sec.) computing is currently under active research. DARPA …

An algorithm for the optimal control of the driving of trains

R Franke, P Terwiesch, M Meyer - Proceedings of the 39th IEEE …, 2000 - ieeexplore.ieee.org
We discuss an algorithm that optimizes the driving style of a train. The objective is to
minimize the electrical energy used for traction subject to constraints such as the travel time …

Scheduling dense linear algebra operations on multicore processors

J Kurzak, H Ltaief, J Dongarra… - … Practice and Experience, 2010 - Wiley Online Library
State‐of‐the‐art dense linear algebra software, such as the LAPACK and ScaLAPACK
libraries, suffers performance losses on multicore processors due to their inability to fully …