- Academic Search

TM Low, FD Igual, TM Smith… - ACM Transactions on …, 2016 - dl.acm.org

We show how the BLAS-like Library Instantiation Software (BLIS) framework, which provides
a more detailed layering of the GotoBLAS (now maintained as OpenBLAS) implementation …

Salva Cita Citato da 191 Articoli correlati Tutte e 7 le versioni

Parallel Deep Learning with a hybrid BP-PSO framework for feature extraction and malware classification

MN Al-Andoli, SC Tan, KS Sim, CP Lim, PY Goh - Applied Soft Computing, 2022 - Elsevier

Malicious software (Malware) is a key threat to security of digital networks and systems.
While traditional machine learning methods have been widely used for malware detection …

Salva Cita Citato da 31 Articoli correlati Tutte e 6 le versioni

[Free GPT-4]
[DeepSeek]

[PDF] ieee.org

An ensemble-based parallel deep learning classifier with PSO-BP optimization for malware detection

MN Al-Andoli, KS Sim, SC Tan, PY Goh, CP Lim - IEEE Access, 2023 - ieeexplore.ieee.org

Digital networks and systems are susceptible to malicious software (malware) attacks. Deep
learning (DL) models have recently emerged as effective methods to classify and detect …

Salva Cita Citato da 17 Articoli correlati Tutte e 7 le versioni

[Free GPT-4]
[DeepSeek]

[PDF] whiterose.ac.uk

A methodology for efficient tile size selection for affine loop kernels

V Kelefouras, K Djemame, G Keramidas… - International Journal of …, 2022 - Springer

Reducing the number of data accesses in memory hierarchy is of paramount importance on
modern computer systems. One of the key optimizations addressing this problem is loop …

Salva Cita Citato da 10 Articoli correlati Tutte e 6 le versioni

[Free GPT-4]
[DeepSeek]

[PDF] mdpi.com

An approach for matrix multiplication of 32-bit fixed point numbers by means of 16-bit SIMD instructions on DSP

I Safonov, A Kornilov, D Makienko - Electronics, 2022 - mdpi.com

Matrix multiplication is an important operation for many engineering applications.
Sometimes new features that include matrix multiplication should be added to existing and …

Salva Cita Citato da 6 Articoli correlati Tutte e 2 le versioni Copia cache

HPMaX: heterogeneous parallel matrix multiplication using CPUs and GPUs

H Kang, HC Kwon, D Kim - Computing, 2020 - Springer

We present a novel heterogeneous parallel matrix multiplication algorithm that utilizes both
central processing units (CPUs) and graphics processing units (GPUs) for large-scale …

Salva Cita Citato da 11 Articoli correlati Tutte e 4 le versioni

[Free GPT-4]
[DeepSeek]

[PDF] shu.ac.uk

A high-performance matrix–matrix multiplication methodology for CPU and GPU architectures

V Kelefouras, A Kritikakou, I Mporas… - The Journal of …, 2016 - Springer

Current compilers cannot generate code that can compete with hand-tuned code in
efficiency, even for a simple kernel like matrix–matrix multiplication (MMM). A key step in …

Salva Cita Citato da 26 Articoli correlati Tutte e 11 le versioni

[Free GPT-4]
[DeepSeek]

[PDF] unsw.edu.au

Automatic generation of fast BLAS3-GEMM: A portable compiler approach

X Su, X Liao, J Xue - 2017 IEEE/ACM International Symposium …, 2017 - ieeexplore.ieee.org

GEMM is the main computational kernel in BLAS3. Its micro-kernel is either hand-crafted in
assembly code or generated from C code by general-purpose compilers (guided by …

Salva Cita Citato da 19 Articoli correlati Tutte e 4 le versioni

Performance evaluation of implicit and explicit SIMDization

H Amiri, A Shahbahrami, A Pohl, B Juurlink - Microprocessors and …, 2018 - Elsevier

Processor vendors have been expanding Single Instruction Multiple Data (SIMD) extensions
to exploit data-level-parallelism in their General Purpose Processors (GPPs). Each SIMD …

Salva Cita Citato da 17 Articoli correlati

[Free GPT-4]
[DeepSeek]

[PDF] ucy.ac.cy

Design and implementation of a highly efficient dgemm for 64-bit armv8 multi-core processors

F Wang, H Jiang, K Zuo, X Su, J Xue… - 2015 44th International …, 2015 - ieeexplore.ieee.org

This paper presents the design and implementation of a highly efficient Double-precision
General Matrix Multiplication (DGEMM) based on Open BLAS for 64-bit ARMv8 eight-core …

Salva Cita Citato da 23 Articoli correlati Tutte e 9 le versioni

Crea avviso

Cita

Ricerca avanzata

Salvato in La mia biblioteca

A Matrix–Matrix Multiplication methodology for single/multi-core architectures using SIMD

Analytical modeling is enough for high-performance BLIS

Parallel Deep Learning with a hybrid BP-PSO framework for feature extraction and malware classification

An ensemble-based parallel deep learning classifier with PSO-BP optimization for malware detection

A methodology for efficient tile size selection for affine loop kernels

An approach for matrix multiplication of 32-bit fixed point numbers by means of 16-bit SIMD instructions on DSP

HPMaX: heterogeneous parallel matrix multiplication using CPUs and GPUs

A high-performance matrix–matrix multiplication methodology for CPU and GPU architectures

Automatic generation of fast BLAS3-GEMM: A portable compiler approach

Performance evaluation of implicit and explicit SIMDization

Design and implementation of a highly efficient dgemm for 64-bit armv8 multi-core processors