A survey on efficient convolutional neural networks and hardware acceleration
Over the past decade, deep-learning-based representations have demonstrated remarkable
performance in academia and industry. The learning capability of convolutional neural …
performance in academia and industry. The learning capability of convolutional neural …
Model compression and hardware acceleration for neural networks: A comprehensive survey
Domain-specific hardware is becoming a promising topic in the backdrop of improvement
slow down for general-purpose processors due to the foreseeable end of Moore's Law …
slow down for general-purpose processors due to the foreseeable end of Moore's Law …
Spike-driven transformer
Abstract Spiking Neural Networks (SNNs) provide an energy-efficient deep learning option
due to their unique spike-based event-driven (ie, spike-driven) paradigm. In this paper, we …
due to their unique spike-based event-driven (ie, spike-driven) paradigm. In this paper, we …
Going deeper with image transformers
Transformers have been recently adapted for large scale image classification, achieving
high scores shaking up the long supremacy of convolutional neural networks. However the …
high scores shaking up the long supremacy of convolutional neural networks. However the …
Repvgg: Making vgg-style convnets great again
We present a simple but powerful architecture of convolutional neural network, which has a
VGG-like inference-time body composed of nothing but a stack of 3x3 convolution and …
VGG-like inference-time body composed of nothing but a stack of 3x3 convolution and …
Dive into deep learning
This open-source book represents our attempt to make deep learning approachable,
teaching readers the concepts, the context, and the code. The entire book is drafted in …
teaching readers the concepts, the context, and the code. The entire book is drafted in …
On layer normalization in the transformer architecture
The Transformer is widely used in natural language processing tasks. To train a Transformer
however, one usually needs a carefully designed learning rate warm-up stage, which is …
however, one usually needs a carefully designed learning rate warm-up stage, which is …
Picking winning tickets before training by preserving gradient flow
Overparameterization has been shown to benefit both the optimization and generalization of
neural networks, but large networks are resource hungry at both training and test time …
neural networks, but large networks are resource hungry at both training and test time …
Wide neural networks of any depth evolve as linear models under gradient descent
A longstanding goal in deep learning research has been to precisely characterize training
and generalization. However, the often complex loss landscapes of neural networks have …
and generalization. However, the often complex loss landscapes of neural networks have …
Pre-training via denoising for molecular property prediction
Many important problems involving molecular property prediction from 3D structures have
limited data, posing a generalization challenge for neural networks. In this paper, we …
limited data, posing a generalization challenge for neural networks. In this paper, we …