Firecaffe: near-linear acceleration of deep neural network training on compute clusters

FN Iandola, MW Moskewicz… - Proceedings of the …, 2016 - openaccess.thecvf.com
Long training times for high-accuracy deep neural networks (DNNs) impede research into
new DNN architectures and slow the development of high-accuracy DNNs. In this paper we …

GPU-accelerated compression and visualization of large-scale vessel trajectories in maritime IoT industries

Y Huang, Y Li, Z Zhang, RW Liu - IEEE Internet of Things …, 2020 - ieeexplore.ieee.org
The automatic identification system (AIS), an automatic vessel-tracking system, has been
widely adopted to perform intelligent traffic management and collision avoidance services in …

Optimizing depthwise separable convolution operations on gpus

G Lu, W Zhang, Z Wang - IEEE Transactions on Parallel and …, 2021 - ieeexplore.ieee.org
The depthwise separable convolution is commonly seen in convolutional neural networks
(CNNs), and is widely used to reduce the computation overhead of a standard multi-channel …

Swizzle inventor: data movement synthesis for GPU kernels

PM Phothilimthana, AS Elliott, A Wang… - Proceedings of the …, 2019 - dl.acm.org
Utilizing memory and register bandwidth in modern architectures may require swizzles---non-
trivial map**s of data and computations onto hardware resources---such as shuffles. We …

Compact convolutional neural network cascade for face detection

I Kalinovskii, V Spitsyn - arxiv preprint arxiv:1508.01292, 2015 - arxiv.org
The problem of faces detection in images or video streams is a classical problem of
computer vision. The multiple solutions of this problem have been proposed, but the …

Real-time Jones phase microscopy for studying transparent and birefringent specimens

Y Jiao, ME Kandel, X Liu, W Lu, G Popescu - Optics Express, 2020 - opg.optica.org
Tissue birefringence is an intrinsic marker of potential value for cancer diagnosis.
Traditionally, birefringence properties have been studied by using intensity-based …

Implementation of the DWT in a GPU through a register-based strategy

P Enfedaque, F Auli-Llinas… - IEEE Transactions on …, 2014 - ieeexplore.ieee.org
The release of the CUDA Kepler architecture in March 2012 has provided Nvidia GPUs with
a larger register memory space and instructions for the communication of registers among …

[BOOK][B] Exploring the design space of deep convolutional neural networks at large scale

F Iandola - 2016 - search.proquest.com
In recent years, the research community has discovered that deep neural networks (DNNs)
and convolutional neural networks (CNNs) can yield higher accuracy than all previous …

Derivation and analysis of fast bilinear algorithms for convolution

C Ju, E Solomonik - SIAM Review, 2020 - SIAM
The prevalence of convolution in applications within signal processing, deep neural
networks, and numerical solvers has motivated the development of numerous fast …

DMC4ML: Data Movement Complexity for Machine Learning

C Ding, C Kanan, D McKellips, T Ozawa… - arxiv preprint arxiv …, 2023 - arxiv.org
The greatest demand for today's computing is machine learning. This paper analyzes three
machine learning algorithms: transformers, spatial convolution, and FFT. The analysis is …