Firecaffe: near-linear acceleration of deep neural network training on compute clusters
Long training times for high-accuracy deep neural networks (DNNs) impede research into
new DNN architectures and slow the development of high-accuracy DNNs. In this paper we …
new DNN architectures and slow the development of high-accuracy DNNs. In this paper we …
GPU-accelerated compression and visualization of large-scale vessel trajectories in maritime IoT industries
Y Huang, Y Li, Z Zhang, RW Liu - IEEE Internet of Things …, 2020 - ieeexplore.ieee.org
The automatic identification system (AIS), an automatic vessel-tracking system, has been
widely adopted to perform intelligent traffic management and collision avoidance services in …
widely adopted to perform intelligent traffic management and collision avoidance services in …
Optimizing depthwise separable convolution operations on gpus
The depthwise separable convolution is commonly seen in convolutional neural networks
(CNNs), and is widely used to reduce the computation overhead of a standard multi-channel …
(CNNs), and is widely used to reduce the computation overhead of a standard multi-channel …
Swizzle inventor: data movement synthesis for GPU kernels
Utilizing memory and register bandwidth in modern architectures may require swizzles---non-
trivial map**s of data and computations onto hardware resources---such as shuffles. We …
trivial map**s of data and computations onto hardware resources---such as shuffles. We …
Compact convolutional neural network cascade for face detection
I Kalinovskii, V Spitsyn - arxiv preprint arxiv:1508.01292, 2015 - arxiv.org
The problem of faces detection in images or video streams is a classical problem of
computer vision. The multiple solutions of this problem have been proposed, but the …
computer vision. The multiple solutions of this problem have been proposed, but the …
Real-time Jones phase microscopy for studying transparent and birefringent specimens
Tissue birefringence is an intrinsic marker of potential value for cancer diagnosis.
Traditionally, birefringence properties have been studied by using intensity-based …
Traditionally, birefringence properties have been studied by using intensity-based …
Implementation of the DWT in a GPU through a register-based strategy
The release of the CUDA Kepler architecture in March 2012 has provided Nvidia GPUs with
a larger register memory space and instructions for the communication of registers among …
a larger register memory space and instructions for the communication of registers among …
[BOOK][B] Exploring the design space of deep convolutional neural networks at large scale
F Iandola - 2016 - search.proquest.com
In recent years, the research community has discovered that deep neural networks (DNNs)
and convolutional neural networks (CNNs) can yield higher accuracy than all previous …
and convolutional neural networks (CNNs) can yield higher accuracy than all previous …
Derivation and analysis of fast bilinear algorithms for convolution
C Ju, E Solomonik - SIAM Review, 2020 - SIAM
The prevalence of convolution in applications within signal processing, deep neural
networks, and numerical solvers has motivated the development of numerous fast …
networks, and numerical solvers has motivated the development of numerous fast …
DMC4ML: Data Movement Complexity for Machine Learning
The greatest demand for today's computing is machine learning. This paper analyzes three
machine learning algorithms: transformers, spatial convolution, and FFT. The analysis is …
machine learning algorithms: transformers, spatial convolution, and FFT. The analysis is …