The LINPACK benchmark: past, present and future

JJ Dongarra, P Luszczek… - … and Computation: practice …, 2003 - Wiley Online Library
This paper describes the LINPACK Benchmark and some of its variations commonly used to
assess the performance of computer systems. Aside from the LINPACK Benchmark suite, the …

Accelerating numerical dense linear algebra calculations with GPUs

J Dongarra, M Gates, A Haidar, J Kurzak… - … computations with GPUs, 2014 - Springer
This chapter presents the current best design and implementation practices for the
acceleration of dense linear algebra (DLA) on GPUs. Examples are given with fundamental …

HPC Programming on Intel Many‐Integrated‐Core Hardware with MAGMA Port to Xeon Phi

J Dongarra, M Gates, A Haidar, Y Jia… - Scientific …, 2015 - Wiley Online Library
This paper presents the design and implementation of several fundamental dense linear
algebra (DLA) algorithms for multicore with Intel Xeon Phi coprocessors. In particular, we …

Unified development for mixed multi-gpu and multi-coprocessor environments using a lightweight runtime environment

A Haidar, C Cao, A Yarkhan, P Luszczek… - 2014 IEEE 28th …, 2014 - ieeexplore.ieee.org
Many of the heterogeneous resources available to modern computers are designed for
different workloads. In order to efficiently use GPU resources, the workload must have a …

High-performance Cholesky factorization for GPU-only execution

A Haidar, A Abdelfatah, S Tomov… - Proceedings of the General …, 2017 - dl.acm.org
We present our performance analysis, algorithm designs, and the optimizations needed for
the development of high-performance GPU-only algorithms, and in particular, for the dense …

LU, QR, and Cholesky factorizations: Programming model, performance analysis and optimization techniques for the Intel Knights Landing Xeon Phi

A Haidar, S Tomov, K Arturov, M Guney… - 2016 IEEE High …, 2016 - ieeexplore.ieee.org
A wide variety of heterogeneous compute resources, ranging from multicore CPUs to GPUs
and coprocessors, are available to modern computers, making it challenging to design …

Flexible linear algebra development and scheduling with cholesky factorization

A Haidar, A YarKhan, C Cao… - 2015 IEEE 17th …, 2015 - ieeexplore.ieee.org
Modern high performance computing environments are composed of networks of compute
nodes that often contain a variety of heterogeneous compute resources, such as multicore …

Model-driven one-sided factorizations on multicore accelerated systems

J Dongarra, A Haidar, J Kurzak, P Luszczek… - Supercomputing …, 2014 - superfri.org
Hardware heterogeneity of the HPC platforms is no longer considered unusual but instead
have become the most viable way forward towards Exascale. In fact, the multitude of the …

Accelerated methods for performing the LDLT decomposition

PE Strazdins - The Proceedings of ANZIAM, 2000 - journal.austms.org.au
This paper describes the design, implementation and performance of parallel direct dense
symmetric-indefinite matrix factorisation algorithms. These algorithms use the Bunch …

[KÖNYV][B] A dense complex symmetric indefinite solver for the Fujitsu AP3000

P Strazdins - 1999 - Citeseer
Ь з д д ж з ж з и з вИ бда б ви Й и гв в д ж гжб в г д ж аа а ж и вз знбб иж Й в Ќв и згак ж
жгйи в К Ый згак ж з ж ей ж гж и а ж гбда м знзи бз ж з в жгб а ижгЙб в и Ќ а в анз зИ зй з …