From the future Si technology perspective: Challenges and opportunities
K Kim - 2010 International Electron Devices Meeting, 2010 - ieeexplore.ieee.org
As silicon technology enters sub-20nm nodes, new materials, structures and processes are
being introduced in order to continue with the advantages of dimensional scaling, eg, 3D …
being introduced in order to continue with the advantages of dimensional scaling, eg, 3D …
Communication lower bounds and optimal algorithms for numerical linear algebra
The traditional metric for the efficiency of a numerical algorithm has been the number of
arithmetic operations it performs. Technological trends have long been reducing the time to …
arithmetic operations it performs. Technological trends have long been reducing the time to …
[書籍][B] Space-filling curves: an introduction with applications in scientific computing
M Bader - 2012 - books.google.com
The present book provides an introduction to using space-filling curves (SFC) as tools in
scientific computing. Special focus is laid on the representation of SFC and on resulting …
scientific computing. Special focus is laid on the representation of SFC and on resulting …
A dependency-aware task-based programming environment for multi-core architectures
Parallel programming on SMP and multi-core architectures is hard. In this paper we present
a programming model for those environments based on automatic function level parallelism …
a programming model for those environments based on automatic function level parallelism …
Communication-optimal parallel recursive rectangular matrix multiplication
Communication-optimal algorithms are known for square matrix multiplication. Here, we
obtain the first communication-optimal algorithm for all dimensions of rectangular matrices …
obtain the first communication-optimal algorithm for all dimensions of rectangular matrices …
Graphgrind: Addressing load imbalance of graph partitioning
We investigate how graph partitioning adversely affects the performance of graph analytics.
We demonstrate that graph partitioning induces extra work during graph traversal and that …
We demonstrate that graph partitioning induces extra work during graph traversal and that …
An algorithm for the optimal control of the driving of trains
R Franke, P Terwiesch, M Meyer - Proceedings of the 39th IEEE …, 2000 - ieeexplore.ieee.org
We discuss an algorithm that optimizes the driving style of a train. The objective is to
minimize the electrical energy used for traction subject to constraints such as the travel time …
minimize the electrical energy used for traction subject to constraints such as the travel time …
SuperMatrix out-of-order scheduling of matrix operations for SMP and multi-core architectures
E Chan, ES Quintana-Orti, G Quintana-Orti… - Proceedings of the …, 2007 - dl.acm.org
We discuss the high-performance parallel implementation and execution of dense linear
algebra matrix operations on SMP architectures, with an eye towards multi-core processors …
algebra matrix operations on SMP architectures, with an eye towards multi-core processors …
The cache-oblivious gaussian elimination paradigm: theoretical framework, parallelization and experimental evaluation
The Gaussian Elimination Paradigm (GEP) was introduced by the authors in [6] to represent
the triply-nested loop computation that occurs in several important algorithms including …
the triply-nested loop computation that occurs in several important algorithms including …
ULCC: a user-level facility for optimizing shared cache performance on multicores
Scientific applications face serious performance challenges on multicore processors, one of
which is caused by access contention in last level shared caches from multiple running …
which is caused by access contention in last level shared caches from multiple running …