Activation functions in deep learning: A comprehensive survey and benchmark
Neural networks have shown tremendous growth in recent years to solve numerous
problems. Various types of neural networks have been introduced to deal with different types …
problems. Various types of neural networks have been introduced to deal with different types …
A review of activation function for artificial neural network
The activation function plays an important role in the training and the performance of an
Artificial Neural Network. They provide the necessary non-linear properties to any Artificial …
Artificial Neural Network. They provide the necessary non-linear properties to any Artificial …
Understanding self-supervised learning dynamics without contrastive pairs
While contrastive approaches of self-supervised learning (SSL) learn representations by
minimizing the distance between two augmented views of the same data point (positive …
minimizing the distance between two augmented views of the same data point (positive …
Dive into deep learning
This open-source book represents our attempt to make deep learning approachable,
teaching readers the concepts, the context, and the code. The entire book is drafted in …
teaching readers the concepts, the context, and the code. The entire book is drafted in …
Finite versus infinite neural networks: an empirical study
We perform a careful, thorough, and large scale empirical study of the correspondence
between wide neural networks and kernel methods. By doing so, we resolve a variety of …
between wide neural networks and kernel methods. By doing so, we resolve a variety of …
Rezero is all you need: Fast convergence at large depth
Deep networks often suffer from vanishing or exploding gradients due to inefficient signal
propagation, leading to long training times or convergence difficulties. Various architecture …
propagation, leading to long training times or convergence difficulties. Various architecture …
Understanding the difficulty of training transformers
Transformers have proved effective in many NLP tasks. However, their training requires non-
trivial efforts regarding designing cutting-edge optimizers and learning rate schedulers …
trivial efforts regarding designing cutting-edge optimizers and learning rate schedulers …
Fixup initialization: Residual learning without normalization
Normalization layers are a staple in state-of-the-art deep neural network architectures. They
are widely believed to stabilize training, enable higher learning rate, accelerate …
are widely believed to stabilize training, enable higher learning rate, accelerate …
Sorting out Lipschitz function approximation
Training neural networks under a strict Lipschitz constraint is useful for provable adversarial
robustness, generalization bounds, interpretable gradients, and Wasserstein distance …
robustness, generalization bounds, interpretable gradients, and Wasserstein distance …
Statistical mechanics of deep learning
The recent striking success of deep neural networks in machine learning raises profound
questions about the theoretical principles underlying their success. For example, what can …
questions about the theoretical principles underlying their success. For example, what can …