- Academic Search

J Albericio, P Judd, T Hetherington, T Aamodt… - ACM SIGARCH …, 2016 - dl.acm.org

This work observes that a large fraction of the computations performed by Deep Neural
Networks (DNNs) are intrinsically ineffectual as they involve a multiplication where one of …

Save Cite Cited by 942 Related articles All 16 versions Free GPT-4

[Free GPT-4]

[PDF] ieee.org

MIMD Programs Execution Support on SIMD Machines: A Holistic Survey

D Mustafa, R Alkhasawneh, F Obeidat… - IEEE Access, 2024 - ieeexplore.ieee.org

The Single Instruction Multiple Data (SIMD) architecture, supported by various high-
performance computing platforms, efficiently utilizes data-level parallelism. The SIMD model …

Save Cite Cited by 5 Related articles All 2 versions Free GPT-4

[Free GPT-4]

[PDF] utexas.edu

Scaling the power wall: a path to exascale

O Villa, DR Johnson, M Oconnor… - SC'14: Proceedings …, 2014 - ieeexplore.ieee.org

Modern scientific discovery is driven by an insatiable demand for computing performance.
The HPC community is targeting development of supercomputers able to sustain 1 ExaFlops …

Save Cite Cited by 187 Related articles All 15 versions Free GPT-4

[Free GPT-4]

[PDF] danielwong.org

Warped-compression: Enabling power efficient GPUs through register compression

S Lee, K Kim, G Koo, H Jeon, WW Ro… - ACM SIGARCH …, 2015 - dl.acm.org

This paper presents Warped-Compression, a warp-level register compression scheme for
reducing GPU power consumption. This work is motivated by the observation that the …

Save Cite Cited by 149 Related articles All 11 versions Free GPT-4

[Free GPT-4]

[PDF] utexas.edu

Flexible software profiling of gpu architectures

M Stephenson, SK Sastry Hari, Y Lee… - Proceedings of the …, 2015 - dl.acm.org

To aid application characterization and architecture design space exploration, researchers
and engineers have developed a wide range of tools for CPUs, including simulators …

Save Cite Cited by 136 Related articles All 8 versions Free GPT-4

[Free GPT-4]

[PDF] acm.org

Cudaadvisor: Llvm-based runtime profiling for modern gpus

D Shen, SL Song, A Li, X Liu - … of the 2018 International Symposium on …, 2018 - dl.acm.org

General-purpose GPUs have been widely utilized to accelerate parallel applications. Given
a relatively complex programming model and fast architecture evolution, producing efficient …

Save Cite Cited by 55 Related articles All 2 versions Free GPT-4

[Free GPT-4]

[PDF] uni-saarland.de

Partial control-flow linearization

S Moll, S Hack - ACM SIGPLAN Notices, 2018 - dl.acm.org

If-conversion is a fundamental technique for vectorization. It accounts for the fact that in a
SIMD program, several targets of a branch might be executed because of divergence …

Save Cite Cited by 51 Related articles All 4 versions Free GPT-4

[Free GPT-4]

[PDF] cam.ac.uk

A sparse probabilistic learning algorithm for real-time tracking

Blake, Cipolla - Proceedings Ninth IEEE International …, 2003 - ieeexplore.ieee.org

We address the problem of applying powerful pattern recognition algorithms based on
kernels to efficient visual tracking. Recently S. Avidan,(2001) has shown that object …

Save Cite Cited by 160 Related articles All 18 versions Free GPT-4

[Free GPT-4]

[PDF] arxiv.org

SPRING: A sparsity-aware reduced-precision monolithic 3D CNN accelerator architecture for training and inference

Y Yu, NK Jha - IEEE Transactions on Emerging Topics in …, 2020 - ieeexplore.ieee.org

Convolutional neural networks (CNNs) outperform traditional machine learning algorithms
across a wide range of applications, such as object recognition, image segmentation, and …

Save Cite Cited by 33 Related articles All 6 versions Free GPT-4

R2D2: Removing ReDunDancy Utilizing Linearity of Address Generation in GPUs

D Ha, Y Oh, WW Ro - Proceedings of the 50th Annual International …, 2023 - dl.acm.org

A generally used GPU programming methodology is that adjacent threads access data in
neighbor or specific-stride memory addresses and perform computations with the fetched …

Save Cite Cited by 7 Related articles All 2 versions Free GPT-4

Create alert

Cite

Advanced search

Saved to My library

Convergence and scalarization for data-parallel architectures

Cnvlutin: Ineffectual-neuron-free deep neural network computing

MIMD Programs Execution Support on SIMD Machines: A Holistic Survey

Scaling the power wall: a path to exascale

Warped-compression: Enabling power efficient GPUs through register compression

Flexible software profiling of gpu architectures

Cudaadvisor: Llvm-based runtime profiling for modern gpus

Partial control-flow linearization

A sparse probabilistic learning algorithm for real-time tracking

SPRING: A sparsity-aware reduced-precision monolithic 3D CNN accelerator architecture for training and inference

R2D2: Removing ReDunDancy Utilizing Linearity of Address Generation in GPUs