Theoretical peak FLOPS per instruction set: a tutorial

R Dolbeau - The Journal of Supercomputing, 2018 - Springer
Traditionally, evaluating the theoretical peak performance of a CPU in FLOPS (floating-point
operations per second) was merely a matter of multiplying the frequency by the number of …

High-level synthesis hardware design for fpga-based accelerators: Models, methodologies, and frameworks

RS Molina, V Gil-Costa, ML Crespo, G Ramponi - IEEE Access, 2022 - ieeexplore.ieee.org
Hardware accelerators based on field programmable gate array (FPGA) and system on chip
(SoC) devices have gained attention in recent years. One of the main reasons is that these …

Gables: A roofline model for mobile socs

M Hill, VJ Reddi - 2019 IEEE International Symposium on High …, 2019 - ieeexplore.ieee.org
Over a billion mobile consumer system-on-chip (SoC) chipsets ship each year. Of these, the
mobile consumer market undoubtedly involving smartphones has a significant market share …

Acceleration of tensor-product operations for high-order finite element methods

K Świrydowicz, N Chalmers… - … Journal of High …, 2019 - journals.sagepub.com
This article is devoted to graphics processing unit (GPU) kernel optimization and
performance analysis of three tensor-product operations arising in finite element methods …

The memory-bounded speedup model and its impacts in computing

XH Sun, X Lu - Journal of Computer Science and Technology, 2023 - Springer
With the surge of big data applications and the worsening of the memory-wall problem, the
memory system, instead of the computing unit, becomes the commonly recognized major …

Density matrix quantum circuit simulation via the BSP machine on modern GPU clusters

A Li, O Subasi, X Yang… - … conference for high …, 2020 - ieeexplore.ieee.org
As quantum computers evolve, simulations of quantum programs on classical computers will
be essential in validating quantum algorithms, understanding the effect of system noise, and …

Muchisim: A simulation framework for design exploration of multi-chip manycore systems

M Orenes-Vera, E Tureci, M Martonosi… - … Analysis of Systems …, 2024 - ieeexplore.ieee.org
The design space exploration of scaled-out manycores for communication-intensive
applications (eg, graph analytics and sparse linear algebra) is hampered due to either lack …

Fast multi-parameter performance modeling

A Calotoiu, D Beckinsale, CW Earl… - 2016 IEEE …, 2016 - ieeexplore.ieee.org
Tuning large applications requires a clever exploration of the design and configuration
space. Especially on supercomputers, this space is so large that its exhaustive traversal via …

K-Athena: A Performance Portable Structured Grid Finite Volume Magnetohydrodynamics Code

P Grete, FW Glines, BW O'Shea - IEEE Transactions on Parallel …, 2020 - ieeexplore.ieee.org
Large scale simulations are a key pillar of modern research and require ever-increasing
computational resources. Different novel manycore architectures have emerged in recent …

Parallelizing stream compression for iot applications on asymmetric multicores

X Zeng, S Zhang - 2023 IEEE 39th International Conference on …, 2023 - ieeexplore.ieee.org
Data stream compression attracts much attention recently due to the rise of IoT applications.
Thanks to the balanced computational power and energy consumption, asymmetric …