Activation functions in deep learning: A comprehensive survey and benchmark

SR Dubey, SK Singh, BB Chaudhuri - Neurocomputing, 2022 - Elsevier
Neural networks have shown tremendous growth in recent years to solve numerous
problems. Various types of neural networks have been introduced to deal with different types …

A review of activation function for artificial neural network

AD Rasamoelina, F Adjailia… - 2020 IEEE 18th World …, 2020 - ieeexplore.ieee.org
The activation function plays an important role in the training and the performance of an
Artificial Neural Network. They provide the necessary non-linear properties to any Artificial …

Understanding self-supervised learning dynamics without contrastive pairs

Y Tian, X Chen, S Ganguli - International Conference on …, 2021 - proceedings.mlr.press
While contrastive approaches of self-supervised learning (SSL) learn representations by
minimizing the distance between two augmented views of the same data point (positive …

Dive into deep learning

A Zhang, ZC Lipton, M Li, AJ Smola - arxiv preprint arxiv:2106.11342, 2021 - arxiv.org
This open-source book represents our attempt to make deep learning approachable,
teaching readers the concepts, the context, and the code. The entire book is drafted in …

Finite versus infinite neural networks: an empirical study

J Lee, S Schoenholz, J Pennington… - Advances in …, 2020 - proceedings.neurips.cc
We perform a careful, thorough, and large scale empirical study of the correspondence
between wide neural networks and kernel methods. By doing so, we resolve a variety of …

Rezero is all you need: Fast convergence at large depth

T Bachlechner, BP Majumder, H Mao… - Uncertainty in …, 2021 - proceedings.mlr.press
Deep networks often suffer from vanishing or exploding gradients due to inefficient signal
propagation, leading to long training times or convergence difficulties. Various architecture …

Understanding the difficulty of training transformers

L Liu, X Liu, J Gao, W Chen, J Han - arxiv preprint arxiv:2004.08249, 2020 - arxiv.org
Transformers have proved effective in many NLP tasks. However, their training requires non-
trivial efforts regarding designing cutting-edge optimizers and learning rate schedulers …

Fixup initialization: Residual learning without normalization

H Zhang, YN Dauphin, T Ma - arxiv preprint arxiv:1901.09321, 2019 - arxiv.org
Normalization layers are a staple in state-of-the-art deep neural network architectures. They
are widely believed to stabilize training, enable higher learning rate, accelerate …

Sorting out Lipschitz function approximation

C Anil, J Lucas, R Grosse - International Conference on …, 2019 - proceedings.mlr.press
Training neural networks under a strict Lipschitz constraint is useful for provable adversarial
robustness, generalization bounds, interpretable gradients, and Wasserstein distance …

Statistical mechanics of deep learning

Y Bahri, J Kadmon, J Pennington… - Annual Review of …, 2020 - annualreviews.org
The recent striking success of deep neural networks in machine learning raises profound
questions about the theoretical principles underlying their success. For example, what can …