A survey on efficient convolutional neural networks and hardware acceleration

D Ghimire, D Kil, S Kim - Electronics, 2022 - mdpi.com
Over the past decade, deep-learning-based representations have demonstrated remarkable
performance in academia and industry. The learning capability of convolutional neural …

Model compression and hardware acceleration for neural networks: A comprehensive survey

L Deng, G Li, S Han, L Shi, Y **e - Proceedings of the IEEE, 2020 - ieeexplore.ieee.org
Domain-specific hardware is becoming a promising topic in the backdrop of improvement
slow down for general-purpose processors due to the foreseeable end of Moore's Law …

Spike-driven transformer

M Yao, J Hu, Z Zhou, L Yuan, Y Tian… - Advances in neural …, 2024 - proceedings.neurips.cc
Abstract Spiking Neural Networks (SNNs) provide an energy-efficient deep learning option
due to their unique spike-based event-driven (ie, spike-driven) paradigm. In this paper, we …

Going deeper with image transformers

H Touvron, M Cord, A Sablayrolles… - Proceedings of the …, 2021 - openaccess.thecvf.com
Transformers have been recently adapted for large scale image classification, achieving
high scores shaking up the long supremacy of convolutional neural networks. However the …

Repvgg: Making vgg-style convnets great again

X Ding, X Zhang, N Ma, J Han… - Proceedings of the …, 2021 - openaccess.thecvf.com
We present a simple but powerful architecture of convolutional neural network, which has a
VGG-like inference-time body composed of nothing but a stack of 3x3 convolution and …

Dive into deep learning

A Zhang, ZC Lipton, M Li, AJ Smola - arxiv preprint arxiv:2106.11342, 2021 - arxiv.org
This open-source book represents our attempt to make deep learning approachable,
teaching readers the concepts, the context, and the code. The entire book is drafted in …

On layer normalization in the transformer architecture

R **ong, Y Yang, D He, K Zheng… - International …, 2020 - proceedings.mlr.press
The Transformer is widely used in natural language processing tasks. To train a Transformer
however, one usually needs a carefully designed learning rate warm-up stage, which is …

Picking winning tickets before training by preserving gradient flow

C Wang, G Zhang, R Grosse - arxiv preprint arxiv:2002.07376, 2020 - arxiv.org
Overparameterization has been shown to benefit both the optimization and generalization of
neural networks, but large networks are resource hungry at both training and test time …

Wide neural networks of any depth evolve as linear models under gradient descent

J Lee, L **ao, S Schoenholz, Y Bahri… - Advances in neural …, 2019 - proceedings.neurips.cc
A longstanding goal in deep learning research has been to precisely characterize training
and generalization. However, the often complex loss landscapes of neural networks have …

Pre-training via denoising for molecular property prediction

S Zaidi, M Schaarschmidt, J Martens, H Kim… - arxiv preprint arxiv …, 2022 - arxiv.org
Many important problems involving molecular property prediction from 3D structures have
limited data, posing a generalization challenge for neural networks. In this paper, we …