Dnnfusion: accelerating deep neural networks execution with advanced operator fusion
Deep Neural Networks (DNNs) have emerged as the core enabler of many major
applications on mobile devices. To achieve high accuracy, DNN models have become …
applications on mobile devices. To achieve high accuracy, DNN models have become …
Data reorganization in memory using 3D-stacked DRAM
In this paper we focus on common data reorganization operations such as shuffle,
pack/unpack, swap, transpose, and layout transformations. Although these operations …
pack/unpack, swap, transpose, and layout transformations. Although these operations …
The design and use of simplepower: a cycle-accurate energy estimation tool
In this paper, we presen t the design and use of a comprehensiv e framework, SimplePower,
for ev aluating the effect of high-level algorithmic, architectural, and compilation trade-offs on …
for ev aluating the effect of high-level algorithmic, architectural, and compilation trade-offs on …
Tiling optimizations for 3D scientific computations
G Rivera, CW Tseng - SC'00: Proceedings of the 2000 ACM …, 2000 - ieeexplore.ieee.org
Compiler transformations can significantly improve data locality for many scientific programs.
In this paper, we show iterative solvers for partial differential equations (PDEs) in three …
In this paper, we show iterative solvers for partial differential equations (PDEs) in three …
Energy-driven integrated hardware-software optimizations using SimplePower
With the emergence of a plethora of embedded and portable applications, energy
dissipation has joined throughput, area, and accuracy/precision as a major design …
dissipation has joined throughput, area, and accuracy/precision as a major design …
Influence of compiler optimizations on system power
High-level compiler optimizations ha ve been widely used to ac hiev e speedups on array-
based codes. Su ch optimizations are becoming increasingly important in embedded signal …
based codes. Su ch optimizations are becoming increasingly important in embedded signal …
Compile-time composition of run-time data and iteration reorderings
MM Strout, L Carter, J Ferrante - Proceedings of the ACM SIGPLAN 2003 …, 2003 - dl.acm.org
Many important applications, such as those using sparse data structures, have memory
reference patterns that are unknown at compile-time. Prior work has developed run-time …
reference patterns that are unknown at compile-time. Prior work has developed run-time …
Tiling, block data layout, and memory hierarchy performance
Recently, several experimental studies have been conducted on block data layout in
conjunction with tiling as a data transformation technique to improve cache performance. In …
conjunction with tiling as a data transformation technique to improve cache performance. In …
Stream programming on general-purpose processors
J Gummaraju, M Rosenblum - 38th Annual IEEE/ACM …, 2005 - ieeexplore.ieee.org
In this paper we investigate map** stream programs (ie, programs written in a streaming
style for streaming architectures such as Imagine and Raw) onto a general-purpose CPU …
style for streaming architectures such as Imagine and Raw) onto a general-purpose CPU …
Data layout transformation for enhancing data locality on nuca chip multiprocessors
With increasing numbers of cores, future CMPs (chip multi-processors) are likely to have a
tiled architecture with a portion of shared L2 cache on each tile and a bank-interleaved …
tiled architecture with a portion of shared L2 cache on each tile and a bank-interleaved …