Optimization techniques for GPU programming
In the past decade, Graphics Processing Units have played an important role in the field of
high-performance computing and they still advance new fields such as IoT, autonomous …
high-performance computing and they still advance new fields such as IoT, autonomous …
Optimizing depthwise separable convolution operations on gpus
The depthwise separable convolution is commonly seen in convolutional neural networks
(CNNs), and is widely used to reduce the computation overhead of a standard multi-channel …
(CNNs), and is widely used to reduce the computation overhead of a standard multi-channel …
Multi-level encoding and decoding in a scalable photonic tensor processor with a photonic general matrix multiply (GeMM) compiler
The resurgence of artificial intelligence enabled by deep learning and high performance
computing has seen a dramatic increase of demand in the accuracy of deep learning model …
computing has seen a dramatic increase of demand in the accuracy of deep learning model …
Towards functional safety compliance of matrix–matrix multiplication for machine learning-based autonomous systems
Autonomous systems execute complex tasks to perceive the environment and take self-
aware decisions with limited human interaction. This autonomy is commonly achieved with …
aware decisions with limited human interaction. This autonomy is commonly achieved with …
NIOT: A Novel Inference Optimization of Transformers on Modern CPUs
In the machine learning era, model inference efficiency is one of the most important issues
for machine learning systems. It is a major challenge to find the optimal configuration in a …
for machine learning systems. It is a major challenge to find the optimal configuration in a …
A methodology for efficient tile size selection for affine loop kernels
Reducing the number of data accesses in memory hierarchy is of paramount importance on
modern computer systems. One of the key optimizations addressing this problem is loop …
modern computer systems. One of the key optimizations addressing this problem is loop …
Full-wave-equation depth extrapolation for migration using matrix multiplication
To investigate wavefield depth extrapolation using the full-wave equation, we have derived
a new depth extrapolation scheme for migration using functions of the vertical wavenumber …
a new depth extrapolation scheme for migration using functions of the vertical wavenumber …
How does the performance of NEAT compare to Reinforcement Learning?
M Andersson - 2022 - diva-portal.org
This study examined the relative performance of Deep Reinforcement Learning compared to
a neuroevolution algorithm called NEAT when used to train AIs in a discrete game …
a neuroevolution algorithm called NEAT when used to train AIs in a discrete game …
Coordinated DMA: improving the DRAM access efficiency for matrix multiplication
S Ma, Z Liu, S Chen, L Huang, Y Guo… - … on Parallel and …, 2019 - ieeexplore.ieee.org
High performance implementation of matrix multiplication is essential for scientific
computing. The memory access procedure is quite possible to be the bottleneck of matrix …
computing. The memory access procedure is quite possible to be the bottleneck of matrix …
Optimizing General Matrix Multiplications on Modern Multi-core DSPs
General Matrix Multiplication (GEMM) is a key subprogram in high-performance computing
(HPC) and deep learning workloads. With the rising significance of power and energy …
(HPC) and deep learning workloads. With the rising significance of power and energy …