Sparse gpu kernels for deep learning
Scientific workloads have traditionally exploited high levels of sparsity to accelerate
computation and reduce memory requirements. While deep neural networks can be made …
computation and reduce memory requirements. While deep neural networks can be made …
Megablocks: Efficient sparse training with mixture-of-experts
We present MegaBlocks, a system for efficient Mixture-of-Experts (MoE) training on GPUs.
Our system ismotivated by the limitations of current frameworks, which restrict the dynamic …
Our system ismotivated by the limitations of current frameworks, which restrict the dynamic …
The tensor algebra compiler
Tensor algebra is a powerful tool with applications in machine learning, data analytics,
engineering and the physical sciences. Tensors are often sparse and compound operations …
engineering and the physical sciences. Tensors are often sparse and compound operations …
ThunderSVM: A fast SVM library on GPUs and CPUs
Support Vector Machines (SVMs) are classic supervised learning models for classification,
regression and distribution estimation. A survey conducted by Kaggle in 2017 shows that …
regression and distribution estimation. A survey conducted by Kaggle in 2017 shows that …
CSR5: An efficient storage format for cross-platform sparse matrix-vector multiplication
W Liu, B Vinter - Proceedings of the 29th ACM on International …, 2015 - dl.acm.org
Sparse matrix-vector multiplication (SpMV) is a fundamental building block for numerous
applications. In this paper, we propose CSR5 (Compressed Sparse Row 5), a new storage …
applications. In this paper, we propose CSR5 (Compressed Sparse Row 5), a new storage …
Faster cnns with direct sparse convolutions and guided pruning
Phenomenally successful in practical inference problems, convolutional neural networks
(CNN) are widely deployed in mobile devices, data centers, and even supercomputers. The …
(CNN) are widely deployed in mobile devices, data centers, and even supercomputers. The …
Sparse tensor core: Algorithm and hardware co-design for vector-wise sparse neural networks on modern gpus
Deep neural networks have become the compelling solution for the applications such as
image classification, object detection, speech recognition, and machine translation …
image classification, object detection, speech recognition, and machine translation …
The Combinatorial BLAS: Design, implementation, and applications
A Buluç, JR Gilbert - The International Journal of High …, 2011 - journals.sagepub.com
This paper presents a scalable high-performance software library to be used for graph
analysis and data mining. Large combinatorial graphs appear in many applications of high …
analysis and data mining. Large combinatorial graphs appear in many applications of high …
A recursive algebraic coloring technique for hardware-efficient symmetric sparse matrix-vector multiplication
C Alappat, A Basermann, AR Bishop… - ACM Transactions on …, 2020 - dl.acm.org
The symmetric sparse matrix-vector multiplication (SymmSpMV) is an important building
block for many numerical linear algebra kernel operations or graph traversal applications …
block for many numerical linear algebra kernel operations or graph traversal applications …
Tensaurus: A versatile accelerator for mixed sparse-dense tensor computations
N Srivastava, H **, S Smith, H Rong… - … Symposium on High …, 2020 - ieeexplore.ieee.org
Tensor factorizations are powerful tools in many machine learning and data analytics
applications. Tensors are often sparse, which makes sparse tensor factorizations memory …
applications. Tensors are often sparse, which makes sparse tensor factorizations memory …