Dnnfusion: accelerating deep neural networks execution with advanced operator fusion

W Niu, J Guan, Y Wang, G Agrawal, B Ren - Proceedings of the 42nd …, 2021 - dl.acm.org
Deep Neural Networks (DNNs) have emerged as the core enabler of many major
applications on mobile devices. To achieve high accuracy, DNN models have become …

Data reorganization in memory using 3D-stacked DRAM

B Akin, F Franchetti, JC Hoe - ACM SIGARCH Computer Architecture …, 2015 - dl.acm.org
In this paper we focus on common data reorganization operations such as shuffle,
pack/unpack, swap, transpose, and layout transformations. Although these operations …

The design and use of simplepower: a cycle-accurate energy estimation tool

W Ye, N Vijaykrishnan, M Kandemir… - Proceedings of the 37th …, 2000 - dl.acm.org
In this paper, we presen t the design and use of a comprehensiv e framework, SimplePower,
for ev aluating the effect of high-level algorithmic, architectural, and compilation trade-offs on …

Tiling optimizations for 3D scientific computations

G Rivera, CW Tseng - SC'00: Proceedings of the 2000 ACM …, 2000 - ieeexplore.ieee.org
Compiler transformations can significantly improve data locality for many scientific programs.
In this paper, we show iterative solvers for partial differential equations (PDEs) in three …

Energy-driven integrated hardware-software optimizations using SimplePower

N Vijaykrishnan, M Kandemir, MJ Irwin… - ACM SIGARCH …, 2000 - dl.acm.org
With the emergence of a plethora of embedded and portable applications, energy
dissipation has joined throughput, area, and accuracy/precision as a major design …

Influence of compiler optimizations on system power

M Kandemir, N Vijaykrishnan, MJ Irwin… - Proceedings of the 37th …, 2000 - dl.acm.org
High-level compiler optimizations ha ve been widely used to ac hiev e speedups on array-
based codes. Su ch optimizations are becoming increasingly important in embedded signal …

Compile-time composition of run-time data and iteration reorderings

MM Strout, L Carter, J Ferrante - Proceedings of the ACM SIGPLAN 2003 …, 2003 - dl.acm.org
Many important applications, such as those using sparse data structures, have memory
reference patterns that are unknown at compile-time. Prior work has developed run-time …

Tiling, block data layout, and memory hierarchy performance

N Park, B Hong, VK Prasanna - IEEE Transactions on Parallel …, 2003 - ieeexplore.ieee.org
Recently, several experimental studies have been conducted on block data layout in
conjunction with tiling as a data transformation technique to improve cache performance. In …

Stream programming on general-purpose processors

J Gummaraju, M Rosenblum - 38th Annual IEEE/ACM …, 2005 - ieeexplore.ieee.org
In this paper we investigate map** stream programs (ie, programs written in a streaming
style for streaming architectures such as Imagine and Raw) onto a general-purpose CPU …

Data layout transformation for enhancing data locality on nuca chip multiprocessors

Q Lu, C Alias, U Bondhugula, T Henretty… - 2009 18th …, 2009 - ieeexplore.ieee.org
With increasing numbers of cores, future CMPs (chip multi-processors) are likely to have a
tiled architecture with a portion of shared L2 cache on each tile and a bank-interleaved …