Fast convolutional nets with fbfft: A GPU performance evaluation

N Vasilache, J Johnson, M Mathieu, S Chintala… - arxiv preprint arxiv …, 2014 - arxiv.org
We examine the performance profile of Convolutional Neural Network training on the current
generation of NVIDIA Graphics Processing Units. We introduce two new Fast Fourier …

BLIS: A framework for rapidly instantiating BLAS functionality

FG Van Zee, RA Van De Geijn - ACM Transactions on Mathematical …, 2015 - dl.acm.org
The BLAS-like Library Instantiation Software (BLIS) framework is a new infrastructure for
rapidly instantiating Basic Linear Algebra Subprograms (BLAS) functionality. Its fundamental …

Anatomy of high-performance matrix multiplication

K Goto, RA Geijn - ACM Transactions on Mathematical Software (TOMS), 2008 - dl.acm.org
We present the basic principles that underlie the high-performance implementation of the
matrix-matrix multiplication that is part of the widely used GotoBLAS library. Design …

[書籍][B] Automatic performance tuning of sparse matrix kernels

RW Vuduc - 2003 - search.proquest.com
This dissertation presents an automated system to generate highly efficient, platform-
adapted implementations of sparse matrix kernels. We show that conventional …

FLAME: Formal linear algebra methods environment

JA Gunnels, FG Gustavson, GM Henry… - ACM Transactions on …, 2001 - dl.acm.org
Since the advent of high-performance distributed-memory parallel computing, the need for
intelligible code has become ever greater. The development and maintenance of libraries …

Analytical modeling is enough for high-performance BLIS

TM Low, FD Igual, TM Smith… - ACM Transactions on …, 2016 - dl.acm.org
We show how the BLAS-like Library Instantiation Software (BLIS) framework, which provides
a more detailed layering of the GotoBLAS (now maintained as OpenBLAS) implementation …

Rotation left digits to enhance the security level of message blocks cryptography

A Al-Hyari, K Aldebei, ZA Alqadi, B Al-Ahmad - IEEE Access, 2022 - ieeexplore.ieee.org
Due to the availability of several social media platforms and their use in sending text
messages, it is necessary to provide an easy and safe way to protect messages from being …

High performance zero-memory overhead direct convolutions

J Zhang, F Franchetti, TM Low - International Conference on …, 2018 - proceedings.mlr.press
The computation of convolution layers in deep neural networks typically rely on high
performance routines that trade space for time by using additional memory (either for …

Design of a high-performance GEMM-like tensor–tensor multiplication

P Springer, P Bientinesi - ACM Transactions on Mathematical Software …, 2018 - dl.acm.org
We present “GEMM-like Tensor–Tensor multiplication”(GETT), a novel approach for dense
tensor contractions that mirrors the design of a high-performance general matrix–matrix …

High-performance tensor contraction without transposition

DA Matthews - SIAM Journal on Scientific Computing, 2018 - SIAM
Tensor computations---in particular tensor contraction (TC)---are important kernels in many
scientific computing applications. Due to the fundamental similarity of TC to matrix …