Google 학술 검색

GF Oliveira, J Gómez-Luna, L Orosa, S Ghose… - IEEE …, 2021 - ieeexplore.ieee.org

Data movement between the CPU and main memory is a first-order obstacle against improv
ing performance, scalability, and energy efficiency in modern systems. Computer systems …

저장 인용 107회 인용 관련 학술자료 전체 13개의 버전

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Hierarchical roofline analysis: How to collect data using performance tools on intel cpus and nvidia gpus

C Yang - arxiv preprint arxiv:2009.02449, 2020 - arxiv.org

This paper surveys a range of methods to collect necessary performance data on Intel CPUs
and NVIDIA GPUs for hierarchical Roofline analysis. As of mid-2020, two vendor …

저장 인용 28회 인용 관련 학술자료 전체 2개의 버전 HTML 버전

[Free GPT-4]
[DeepSeek]

[PDF] escholarship.org

[책][B] An instruction roofline model for gpus

N Ding, S Williams - 2019 - ieeexplore.ieee.org

The Roofline performance model provides an intuitive approach to identify performance
bottlenecks and guide performance optimization. However, the classic FLOP-centric …

저장 인용 75회 인용 관련 학술자료 전체 10개의 버전 도서관 검색

[Free GPT-4]
[DeepSeek]

[PDF] wiley.com

Hierarchical Roofline analysis for GPUs: Accelerating performance optimization for the NERSC‐9 Perlmutter system

C Yang, T Kurth, S Williams - Concurrency and Computation …, 2020 - Wiley Online Library

The Roofline performance model provides an intuitive and insightful approach to identifying
performance bottlenecks and guiding performance optimization. In preparation for the next …

저장 인용 73회 인용 관련 학술자료 전체 7개의 버전

[Free GPT-4]
[DeepSeek]

[PDF] escholarship.org

A comprehensive methodology to optimize FPGA designs via the roofline model

M Siracusa, E Del Sozzo, M Rabozzi… - IEEE Transactions …, 2021 - ieeexplore.ieee.org

With reconfigurable fabrics delivering increasing performance over the years, Field-
Programmable Gate Arrays (FPGAs) are becoming an appealing solution for next …

저장 인용 34회 인용 관련 학술자료 전체 5개의 버전

[Free GPT-4]
[DeepSeek]

[PDF] unixer.de

Capability models for manycore memory systems: A case-study with Xeon Phi KNL

S Ramos, T Hoefler - 2017 IEEE International Parallel and …, 2017 - ieeexplore.ieee.org

Increasingly complex memory systems and onchip interconnects are developed to mitigate
the data movement bottlenecks in manycore processors. One example of such a complex …

저장 인용 67회 인용 관련 학술자료 전체 28개의 버전

[Free GPT-4]
[DeepSeek]

[PDF] hal.science

High-performance matrix-matrix multiplications of very small matrices

I Masliah, A Abdelfattah, A Haidar, S Tomov… - Euro-Par 2016: Parallel …, 2016 - Springer

The use of the general dense matrix-matrix multiplication (GEMM) is fundamental for
obtaining high performance in many scientific computing applications. GEMMs for small …

저장 인용 69회 인용 관련 학술자료 전체 13개의 버전

[Free GPT-4]
[DeepSeek]

[PDF] supercomputing.org

An empirical roofline methodology for quantitatively assessing performance portability

C Yang, R Gayatri, T Kurth, P Basu… - 2018 IEEE/ACM …, 2018 - ieeexplore.ieee.org

System and node architectures continue to diversify to better balance on-node computation,
memory capacity, memory bandwidth, interconnect bandwidth, power, and cost for specific …

저장 인용 51회 인용 관련 학술자료 전체 6개의 버전

[Free GPT-4]
[DeepSeek]

[PDF] researchgate.net

Exploring and analyzing the real impact of modern on-package memory on HPC scientific kernels

A Li, W Liu, MRB Kristensen, B Vinter, H Wang… - Proceedings of the …, 2017 - dl.acm.org

High-bandwidth On-Package Memory (OPM) innovates the conventional memory hierarchy
by augmenting a new on-package layer between classic on-chip cache and off-chip DRAM …

저장 인용 57회 인용 관련 학술자료 전체 8개의 버전

GIRAF: General purpose in-storage resistive associative framework

L Yavits, R Kaplan, R Ginosar - IEEE Transactions on Parallel …, 2021 - ieeexplore.ieee.org

GIRAF is a General purpose In-storage Resistive Associative Framework based on resistive
content addressable memory (RCAM), which functions simultaneously as a storage and a …

저장 인용 34회 인용 관련 학술자료 전체 5개의 버전

알림 만들기

인용

고급 검색

라이브러리에 저장됨

Applying the roofline performance model to the intel xeon phi knights landing processor

DAMOV: A new methodology and benchmark suite for evaluating data movement bottlenecks

Hierarchical roofline analysis: How to collect data using performance tools on intel cpus and nvidia gpus

[책][B] An instruction roofline model for gpus

Hierarchical Roofline analysis for GPUs: Accelerating performance optimization for the NERSC‐9 Perlmutter system

A comprehensive methodology to optimize FPGA designs via the roofline model

Capability models for manycore memory systems: A case-study with Xeon Phi KNL

High-performance matrix-matrix multiplications of very small matrices

An empirical roofline methodology for quantitatively assessing performance portability

Exploring and analyzing the real impact of modern on-package memory on HPC scientific kernels

GIRAF: General purpose in-storage resistive associative framework