- Academic Search

FN Iandola, MW Moskewicz… - Proceedings of the …, 2016 - openaccess.thecvf.com

Long training times for high-accuracy deep neural networks (DNNs) impede research into
new DNN architectures and slow the development of high-accuracy DNNs. In this paper we …

Save Cite Cited by 402 Related articles All 12 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

GPU-accelerated compression and visualization of large-scale vessel trajectories in maritime IoT industries

Y Huang, Y Li, Z Zhang, RW Liu - IEEE Internet of Things …, 2020 - ieeexplore.ieee.org

The automatic identification system (AIS), an automatic vessel-tracking system, has been
widely adopted to perform intelligent traffic management and collision avoidance services in …

Save Cite Cited by 113 Related articles All 5 versions Free GPT-4

[Free GPT-4]

[PDF] whiterose.ac.uk

Optimizing depthwise separable convolution operations on gpus

G Lu, W Zhang, Z Wang - IEEE Transactions on Parallel and …, 2021 - ieeexplore.ieee.org

The depthwise separable convolution is commonly seen in convolutional neural networks
(CNNs), and is widely used to reduce the computation overhead of a standard multi-channel …

Save Cite Cited by 60 Related articles All 4 versions Free GPT-4

[Free GPT-4]

[PDF] acm.org

Swizzle inventor: data movement synthesis for GPU kernels

PM Phothilimthana, AS Elliott, A Wang… - Proceedings of the …, 2019 - dl.acm.org

Utilizing memory and register bandwidth in modern architectures may require swizzles---non-
trivial map**s of data and computations onto hardware resources---such as shuffles. We …

Save Cite Cited by 59 Related articles All 7 versions Free GPT-4

[Free GPT-4]

[PDF] arxiv.org

Compact convolutional neural network cascade for face detection

I Kalinovskii, V Spitsyn - arxiv preprint arxiv:1508.01292, 2015 - arxiv.org

The problem of faces detection in images or video streams is a classical problem of
computer vision. The multiple solutions of this problem have been proposed, but the …

Save Cite Cited by 54 Related articles All 7 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] optica.org

Real-time Jones phase microscopy for studying transparent and birefringent specimens

Y Jiao, ME Kandel, X Liu, W Lu, G Popescu - Optics Express, 2020 - opg.optica.org

Tissue birefringence is an intrinsic marker of potential value for cancer diagnosis.
Traditionally, birefringence properties have been studied by using intensity-based …

Save Cite Cited by 22 Related articles All 9 versions Free GPT-4

[Free GPT-4]

[PDF] core.ac.uk

Implementation of the DWT in a GPU through a register-based strategy

P Enfedaque, F Auli-Llinas… - IEEE Transactions on …, 2014 - ieeexplore.ieee.org

The release of the CUDA Kepler architecture in March 2012 has provided Nvidia GPUs with
a larger register memory space and instructions for the communication of registers among …

Save Cite Cited by 45 Related articles All 11 versions Free GPT-4

[Free GPT-4]

[PDF] arxiv.org

[BOOK][B] Exploring the design space of deep convolutional neural networks at large scale

F Iandola - 2016 - search.proquest.com

In recent years, the research community has discovered that deep neural networks (DNNs)
and convolutional neural networks (CNNs) can yield higher accuracy than all previous …

Save Cite Cited by 30 Related articles All 6 versions Free GPT-4 Library Search

[Free GPT-4]

[PDF] arxiv.org

Derivation and analysis of fast bilinear algorithms for convolution

C Ju, E Solomonik - SIAM Review, 2020 - SIAM

The prevalence of convolution in applications within signal processing, deep neural
networks, and numerical solvers has motivated the development of numerous fast …

Save Cite Cited by 14 Related articles All 5 versions Free GPT-4

[Free GPT-4]

[PDF] arxiv.org

DMC4ML: Data Movement Complexity for Machine Learning

C Ding, C Kanan, D McKellips, T Ozawa… - arxiv preprint arxiv …, 2023 - arxiv.org

The greatest demand for today's computing is machine learning. This paper analyzes three
machine learning algorithms: transformers, spatial convolution, and FFT. The analysis is …

Create alert

Cite

Advanced search

Saved to My library

Communication-minimizing 2D convolution in GPU registers

Firecaffe: near-linear acceleration of deep neural network training on compute clusters

GPU-accelerated compression and visualization of large-scale vessel trajectories in maritime IoT industries

Optimizing depthwise separable convolution operations on gpus

Swizzle inventor: data movement synthesis for GPU kernels

Compact convolutional neural network cascade for face detection

Real-time Jones phase microscopy for studying transparent and birefringent specimens

Implementation of the DWT in a GPU through a register-based strategy

[BOOK][B] Exploring the design space of deep convolutional neural networks at large scale

Derivation and analysis of fast bilinear algorithms for convolution

DMC4ML: Data Movement Complexity for Machine Learning