Theoretical peak FLOPS per instruction set: a tutorial
R Dolbeau - The Journal of Supercomputing, 2018 - Springer
Traditionally, evaluating the theoretical peak performance of a CPU in FLOPS (floating-point
operations per second) was merely a matter of multiplying the frequency by the number of …
operations per second) was merely a matter of multiplying the frequency by the number of …
High-level synthesis hardware design for fpga-based accelerators: Models, methodologies, and frameworks
Hardware accelerators based on field programmable gate array (FPGA) and system on chip
(SoC) devices have gained attention in recent years. One of the main reasons is that these …
(SoC) devices have gained attention in recent years. One of the main reasons is that these …
Gables: A roofline model for mobile socs
Over a billion mobile consumer system-on-chip (SoC) chipsets ship each year. Of these, the
mobile consumer market undoubtedly involving smartphones has a significant market share …
mobile consumer market undoubtedly involving smartphones has a significant market share …
Acceleration of tensor-product operations for high-order finite element methods
This article is devoted to graphics processing unit (GPU) kernel optimization and
performance analysis of three tensor-product operations arising in finite element methods …
performance analysis of three tensor-product operations arising in finite element methods …
The memory-bounded speedup model and its impacts in computing
With the surge of big data applications and the worsening of the memory-wall problem, the
memory system, instead of the computing unit, becomes the commonly recognized major …
memory system, instead of the computing unit, becomes the commonly recognized major …
Density matrix quantum circuit simulation via the BSP machine on modern GPU clusters
As quantum computers evolve, simulations of quantum programs on classical computers will
be essential in validating quantum algorithms, understanding the effect of system noise, and …
be essential in validating quantum algorithms, understanding the effect of system noise, and …
Muchisim: A simulation framework for design exploration of multi-chip manycore systems
The design space exploration of scaled-out manycores for communication-intensive
applications (eg, graph analytics and sparse linear algebra) is hampered due to either lack …
applications (eg, graph analytics and sparse linear algebra) is hampered due to either lack …
Fast multi-parameter performance modeling
Tuning large applications requires a clever exploration of the design and configuration
space. Especially on supercomputers, this space is so large that its exhaustive traversal via …
space. Especially on supercomputers, this space is so large that its exhaustive traversal via …
K-Athena: A Performance Portable Structured Grid Finite Volume Magnetohydrodynamics Code
Large scale simulations are a key pillar of modern research and require ever-increasing
computational resources. Different novel manycore architectures have emerged in recent …
computational resources. Different novel manycore architectures have emerged in recent …
Parallelizing stream compression for iot applications on asymmetric multicores
Data stream compression attracts much attention recently due to the rise of IoT applications.
Thanks to the balanced computational power and energy consumption, asymmetric …
Thanks to the balanced computational power and energy consumption, asymmetric …