- Academic Search

N Vasilache, J Johnson, M Mathieu, S Chintala… - arxiv preprint arxiv …, 2014 - arxiv.org

We examine the performance profile of Convolutional Neural Network training on the current
generation of NVIDIA Graphics Processing Units. We introduce two new Fast Fourier …

保存引用被引用数: 421 関連記事全 7 バージョン HTMLバージョン

[Free GPT-4]
[DeepSeek]

[PDF] osti.gov

BLIS: A framework for rapidly instantiating BLAS functionality

FG Van Zee, RA Van De Geijn - ACM Transactions on Mathematical …, 2015 - dl.acm.org

The BLAS-like Library Instantiation Software (BLIS) framework is a new infrastructure for
rapidly instantiating Basic Linear Algebra Subprograms (BLAS) functionality. Its fundamental …

保存引用被引用数: 453 関連記事全 7 バージョン

[Free GPT-4]
[DeepSeek]

[PDF] psu.edu

Anatomy of high-performance matrix multiplication

K Goto, RA Geijn - ACM Transactions on Mathematical Software (TOMS), 2008 - dl.acm.org

We present the basic principles that underlie the high-performance implementation of the
matrix-matrix multiplication that is part of the widely used GotoBLAS library. Design …

保存引用被引用数: 1019 関連記事全 17 バージョン

[Free GPT-4]
[DeepSeek]

[PDF] psu.edu

[書籍][B] Automatic performance tuning of sparse matrix kernels

RW Vuduc - 2003 - search.proquest.com

This dissertation presents an automated system to generate highly efficient, platform-
adapted implementations of sparse matrix kernels. We show that conventional …

保存引用被引用数: 365 関連記事全 11 バージョン図書館検索

[Free GPT-4]
[DeepSeek]

[PDF] academia.edu

FLAME: Formal linear algebra methods environment

JA Gunnels, FG Gustavson, GM Henry… - ACM Transactions on …, 2001 - dl.acm.org

Since the advent of high-performance distributed-memory parallel computing, the need for
intelligible code has become ever greater. The development and maintenance of libraries …

保存引用被引用数: 386 関連記事全 8 バージョン

[Free GPT-4]
[DeepSeek]

[PDF] acm.org

Analytical modeling is enough for high-performance BLIS

TM Low, FD Igual, TM Smith… - ACM Transactions on …, 2016 - dl.acm.org

We show how the BLAS-like Library Instantiation Software (BLIS) framework, which provides
a more detailed layering of the GotoBLAS (now maintained as OpenBLAS) implementation …

保存引用被引用数: 191 関連記事全 7 バージョン

[Free GPT-4]
[DeepSeek]

[PDF] ieee.org

Rotation left digits to enhance the security level of message blocks cryptography

A Al-Hyari, K Aldebei, ZA Alqadi, B Al-Ahmad - IEEE Access, 2022 - ieeexplore.ieee.org

Due to the availability of several social media platforms and their use in sending text
messages, it is necessary to provide an easy and safe way to protect messages from being …

保存引用被引用数: 39 関連記事全 2 バージョン

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

High performance zero-memory overhead direct convolutions

J Zhang, F Franchetti, TM Low - International Conference on …, 2018 - proceedings.mlr.press

The computation of convolution layers in deep neural networks typically rely on high
performance routines that trade space for time by using additional memory (either for …

保存引用被引用数: 103 関連記事全 7 バージョン HTMLバージョン

[Free GPT-4]
[DeepSeek]

[PDF] semanticscholar.org

Design of a high-performance GEMM-like tensor–tensor multiplication

P Springer, P Bientinesi - ACM Transactions on Mathematical Software …, 2018 - dl.acm.org

We present “GEMM-like Tensor–Tensor multiplication”(GETT), a novel approach for dense
tensor contractions that mirrors the design of a high-performance general matrix–matrix …

保存引用被引用数: 118 関連記事全 7 バージョン

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

High-performance tensor contraction without transposition

DA Matthews - SIAM Journal on Scientific Computing, 2018 - SIAM

Tensor computations---in particular tensor contraction (TC)---are important kernels in many
scientific computing applications. Due to the fundamental similarity of TC to matrix …

保存引用被引用数: 116 関連記事全 7 バージョン

引用

検索オプション

マイライブラリに保存しました

Fast convolutional nets with fbfft: A GPU performance evaluation

BLIS: A framework for rapidly instantiating BLAS functionality

Anatomy of high-performance matrix multiplication

[書籍][B] Automatic performance tuning of sparse matrix kernels

FLAME: Formal linear algebra methods environment

Analytical modeling is enough for high-performance BLIS

Rotation left digits to enhance the security level of message blocks cryptography

High performance zero-memory overhead direct convolutions

Design of a high-performance GEMM-like tensor–tensor multiplication

High-performance tensor contraction without transposition