Algorithmic redistribution methods for block-cyclic decompositions

AP Petitet, JJ Dongarra - IEEE Transactions on Parallel and …, 1999 - ieeexplore.ieee.org
This article presents various data redistribution methods for block-partitioned linear algebra
algorithms operating on dense matrices that are distributed in a block-cyclic fashion …

From heterogeneous task scheduling to heterogeneous mixed parallel scheduling

F Suter, F Desprez, H Casanova - Euro-Par 2004 Parallel Processing: 10th …, 2004 - Springer
Mixed-parallelism, the combination of data-and task-parallelism, is a powerful way of
increasing the scalability of entire classes of parallel applications on platforms comprising …

[PDF][PDF] String matching on multicontext FPGAs using self-reconfiguration

RPS Sidhu, A Mei, VK Prasanna - Proceedings of the 1999 ACM/SIGDA …, 1999 - dl.acm.org
FPGAs can perform better than ASICs if the logic mapped onto them is optimized for each
problem instance. Unfortunately, this advantage is often canceled by the long time needed …

Efficient algorithms for block-cyclic array redistribution between processor sets

N Park, VK Prasanna… - IEEE Transactions on …, 1999 - ieeexplore.ieee.org
Run-time array redistribution is necessary to enhance the performance of parallel programs
on distributed memory supercomputers. In this paper, we present an efficient algorithm for …

A generalized processor map** technique for array redistribution

CH Hsu, YC Chung, DL Yang… - IEEE Transactions on …, 2001 - ieeexplore.ieee.org
In many scientific applications, array redistribution is usually required to enhance data
locality and reduce remote memory access in many parallel programs on distributed …

A framework for efficient data redistribution on distributed memory multicomputers

M Guo, I Nakata - The Journal of Supercomputing, 2001 - Springer
Array redistribution is required often in programs on distributed memory parallel computers.
It is essential to use efficient algorithms for redistribution; otherwise the performance of the …

Memory-efficient array redistribution through portable collective communication

NA Rink, A Paszke, D Vytiniotis, GS Schmid - arxiv preprint arxiv …, 2021 - arxiv.org
Modern large-scale deep learning workloads highlight the need for parallel execution
across many devices in order to fit model data into hardware accelerator memories. In these …

A mixed triangular and quadrilateral partition for fractal image coding

F Davoine, J Svensson… - … Conference on Image …, 1995 - ieeexplore.ieee.org
This paper presents a new partitioning scheme for fractal image coding, based on triangles
and quadrilaterals. The aim is to have the advantage of the triangles over the square and …

Contention-free communication scheduling for array redistribution

M Guo, I Nakata, Y Yamashita - … 1998 International Conference …, 1998 - ieeexplore.ieee.org
Array redistribution is required often in programs on distributed memory parallel computers.
It is essential to use efficient algorithms for redistribution, otherwise the performance of the …

Parallel extension of a dynamic performance forecasting tool

E Caron, F Suter - 2002 - inria.hal.science
This article presents an extension of the library to handle parallel routines. is a dynamic
performance forecasting tool in a metacomputing environment. Here we propose to combine …