An overview and comparative analysis of recurrent neural networks for short term load forecasting

FM Bianchi, E Maiorino, MC Kampffmeyer… - arxiv preprint arxiv …, 2017 - arxiv.org
The key component in forecasting demand and consumption of resources in a supply
network is an accurate prediction of real-valued time series. Indeed, both service …

Gradient descent finds global minima of deep neural networks

S Du, J Lee, H Li, L Wang… - … conference on machine …, 2019 - proceedings.mlr.press
Gradient descent finds a global minimum in training deep neural networks despite the
objective function being non-convex. The current paper proves gradient descent achieves …

Wide neural networks of any depth evolve as linear models under gradient descent

J Lee, L **ao, S Schoenholz, Y Bahri… - Advances in neural …, 2019 - proceedings.neurips.cc
A longstanding goal in deep learning research has been to precisely characterize training
and generalization. However, the often complex loss landscapes of neural networks have …

Deep neural networks as gaussian processes

J Lee, Y Bahri, R Novak, SS Schoenholz… - arxiv preprint arxiv …, 2017 - arxiv.org
It has long been known that a single-layer fully-connected neural network with an iid prior
over its parameters is equivalent to a Gaussian process (GP), in the limit of infinite network …

[BOOK][B] The principles of deep learning theory

DA Roberts, S Yaida, B Hanin - 2022 - cambridge.org
This textbook establishes a theoretical framework for understanding deep learning models
of practical relevance. With an approach that borrows from theoretical physics, Roberts and …

Generative learning for nonlinear dynamics

W Gilpin - Nature Reviews Physics, 2024 - nature.com
Modern generative machine learning models are able to create realistic outputs far beyond
their training data, such as photorealistic artwork, accurate protein structures or …

Understanding batch normalization

N Bjorck, CP Gomes, B Selman… - Advances in neural …, 2018 - proceedings.neurips.cc
Batch normalization (BN) is a technique to normalize activations in intermediate layers of
deep neural networks. Its tendency to improve accuracy and speed up training have …

Understanding plasticity in neural networks

C Lyle, Z Zheng, E Nikishin, BA Pires… - International …, 2023 - proceedings.mlr.press
Plasticity, the ability of a neural network to quickly change its predictions in response to new
information, is essential for the adaptability and robustness of deep reinforcement learning …

The shaped transformer: Attention models in the infinite depth-and-width limit

L Noci, C Li, M Li, B He, T Hofmann… - Advances in …, 2024 - proceedings.neurips.cc
In deep learning theory, the covariance matrix of the representations serves as aproxy to
examine the network's trainability. Motivated by the success of Transform-ers, we study the …

How good is the Bayes posterior in deep neural networks really?

F Wenzel, K Roth, BS Veeling, J Świątkowski… - arxiv preprint arxiv …, 2020 - arxiv.org
During the past five years the Bayesian deep learning community has developed
increasingly accurate and efficient approximate inference procedures that allow for …