EIE: Efficient inference engine on compressed deep neural network

S Han, X Liu, H Mao, J Pu, A Pedram… - ACM SIGARCH …, 2016 - dl.acm.org
State-of-the-art deep neural networks (DNNs) have hundreds of millions of connections and
are both computationally and memory intensive, making them difficult to deploy on …

SCNN: An accelerator for compressed-sparse convolutional neural networks

A Parashar, M Rhu, A Mukkara, A Puglielli… - ACM SIGARCH …, 2017 - dl.acm.org
Convolutional Neural Networks (CNNs) have emerged as a fundamental technology for
machine learning. High performance and extreme energy efficiency are critical for …

Hardware and software optimizations for accelerating deep neural networks: Survey of current trends, challenges, and the road ahead

M Capra, B Bussolino, A Marchisio, G Masera… - IEEE …, 2020 - ieeexplore.ieee.org
Currently, Machine Learning (ML) is becoming ubiquitous in everyday life. Deep Learning
(DL) is already present in many applications ranging from computer vision for medicine to …

Implementing sparse matrix-vector multiplication on throughput-oriented processors

N Bell, M Garland - Proceedings of the conference on high performance …, 2009 - dl.acm.org
Sparse matrix-vector multiplication (SpMV) is of singular importance in sparse linear
algebra. In contrast to the uniform regularity of dense linear algebra, sparse operations …

[PDF][PDF] The Chinese Wall Security Policy.

DFC Brewer, MJ Nash - S&P, 1989 - facweb.iitkgp.ac.in
Everyone who has seen the movie Wall Street will have seen a commercial security policy in
action. The recent work of Clark and Wilson and the WIPCIS initiative (the Workshop on …

Communication lower bounds and optimal algorithms for numerical linear algebra

G Ballard, E Carson, J Demmel, M Hoemmen… - Acta Numerica, 2014 - cambridge.org
The traditional metric for the efficiency of a numerical algorithm has been the number of
arithmetic operations it performs. Technological trends have long been reducing the time to …

Optimization of sparse matrix-vector multiplication on emerging multicore platforms

S Williams, L Oliker, R Vuduc, J Shalf, K Yelick… - Proceedings of the …, 2007 - dl.acm.org
We are witnessing a dramatic change in computer architecture due to the multicore
paradigm shift, as every electronic device from cell phones to supercomputers confronts …

CSR5: An efficient storage format for cross-platform sparse matrix-vector multiplication

W Liu, B Vinter - Proceedings of the 29th ACM on International …, 2015 - dl.acm.org
Sparse matrix-vector multiplication (SpMV) is a fundamental building block for numerous
applications. In this paper, we propose CSR5 (Compressed Sparse Row 5), a new storage …

OSKI: A library of automatically tuned sparse matrix kernels

R Vuduc, JW Demmel, KA Yelick - Journal of Physics …, 2005 - iopscience.iop.org
Abstract The Optimized Sparse Kernel Interface (OSKI) is a collection of low-level primitives
that provide automatically tuned computational kernels on sparse matrices, for use by solver …

Model-driven autotuning of sparse matrix-vector multiply on GPUs

JW Choi, A Singh, RW Vuduc - ACM sigplan notices, 2010 - dl.acm.org
We present a performance model-driven framework for automated performance tuning
(autotuning) of sparse matrix-vector multiply (SpMV) on systems accelerated by graphics …