Google 학술 검색

W Niu, J Guan, Y Wang, G Agrawal, B Ren - Proceedings of the 42nd …, 2021 - dl.acm.org

Deep Neural Networks (DNNs) have emerged as the core enabler of many major
applications on mobile devices. To achieve high accuracy, DNN models have become …

저장 인용 155회 인용 관련 학술자료 전체 7개의 버전

Data reorganization in memory using 3D-stacked DRAM

B Akin, F Franchetti, JC Hoe - ACM SIGARCH Computer Architecture …, 2015 - dl.acm.org

In this paper we focus on common data reorganization operations such as shuffle,
pack/unpack, swap, transpose, and layout transformations. Although these operations …

저장 인용 257회 인용 관련 학술자료 전체 5개의 버전

[Free GPT-4]
[DeepSeek]

[PDF] acm.org

The design and use of simplepower: a cycle-accurate energy estimation tool

W Ye, N Vijaykrishnan, M Kandemir… - Proceedings of the 37th …, 2000 - dl.acm.org

In this paper, we presen t the design and use of a comprehensiv e framework, SimplePower,
for ev aluating the effect of high-level algorithmic, architectural, and compilation trade-offs on …

저장 인용 684회 인용 관련 학술자료 전체 15개의 버전

[Free GPT-4]
[DeepSeek]

[PDF] psu.edu

Tiling optimizations for 3D scientific computations

G Rivera, CW Tseng - SC'00: Proceedings of the 2000 ACM …, 2000 - ieeexplore.ieee.org

Compiler transformations can significantly improve data locality for many scientific programs.
In this paper, we show iterative solvers for partial differential equations (PDEs) in three …

저장 인용 315회 인용 관련 학술자료 전체 9개의 버전

[Free GPT-4]
[DeepSeek]

[PDF] acm.org

Energy-driven integrated hardware-software optimizations using SimplePower

N Vijaykrishnan, M Kandemir, MJ Irwin… - ACM SIGARCH …, 2000 - dl.acm.org

With the emergence of a plethora of embedded and portable applications, energy
dissipation has joined throughput, area, and accuracy/precision as a major design …

저장 인용 350회 인용 관련 학술자료 전체 17개의 버전

[Free GPT-4]
[DeepSeek]

[PDF] acm.org

Influence of compiler optimizations on system power

M Kandemir, N Vijaykrishnan, MJ Irwin… - Proceedings of the 37th …, 2000 - dl.acm.org

High-level compiler optimizations ha ve been widely used to ac hiev e speedups on array-
based codes. Su ch optimizations are becoming increasingly important in embedded signal …

저장 인용 215회 인용 관련 학술자료 전체 20개의 버전

[Free GPT-4]
[DeepSeek]

[PDF] arizona.edu

Compile-time composition of run-time data and iteration reorderings

MM Strout, L Carter, J Ferrante - Proceedings of the ACM SIGPLAN 2003 …, 2003 - dl.acm.org

Many important applications, such as those using sparse data structures, have memory
reference patterns that are unknown at compile-time. Prior work has developed run-time …

저장 인용 153회 인용 관련 학술자료 전체 14개의 버전

[Free GPT-4]
[DeepSeek]

[PDF] psu.edu

Tiling, block data layout, and memory hierarchy performance

N Park, B Hong, VK Prasanna - IEEE Transactions on Parallel …, 2003 - ieeexplore.ieee.org

Recently, several experimental studies have been conducted on block data layout in
conjunction with tiling as a data transformation technique to improve cache performance. In …

저장 인용 157회 인용 관련 학술자료 전체 9개의 버전

[Free GPT-4]
[DeepSeek]

[PDF] psu.edu

Stream programming on general-purpose processors

J Gummaraju, M Rosenblum - 38th Annual IEEE/ACM …, 2005 - ieeexplore.ieee.org

In this paper we investigate map** stream programs (ie, programs written in a streaming
style for streaming architectures such as Imagine and Raw) onto a general-purpose CPU …

저장 인용 169회 인용 관련 학술자료 전체 7개의 버전

[Free GPT-4]
[DeepSeek]

[PDF] hal.science

Data layout transformation for enhancing data locality on nuca chip multiprocessors

Q Lu, C Alias, U Bondhugula, T Henretty… - 2009 18th …, 2009 - ieeexplore.ieee.org

With increasing numbers of cores, future CMPs (chip multi-processors) are likely to have a
tiled architecture with a portion of shared L2 cache on each tile and a bank-interleaved …

저장 인용 110회 인용 관련 학술자료 전체 19개의 버전

알림 만들기

인용

고급 검색

라이브러리에 저장됨

Improving locality using loop and data transformations in an integrated framework

Dnnfusion: accelerating deep neural networks execution with advanced operator fusion

Data reorganization in memory using 3D-stacked DRAM

The design and use of simplepower: a cycle-accurate energy estimation tool

Tiling optimizations for 3D scientific computations

Energy-driven integrated hardware-software optimizations using SimplePower

Influence of compiler optimizations on system power

Compile-time composition of run-time data and iteration reorderings

Tiling, block data layout, and memory hierarchy performance

Stream programming on general-purpose processors

Data layout transformation for enhancing data locality on nuca chip multiprocessors