Fine-grained analysis of optimization and generalization for overparameterized two-layer neural networks
Recent works have cast some light on the mystery of why deep nets fit any data and
generalize despite being very overparametrized. This paper analyzes training and …
generalize despite being very overparametrized. This paper analyzes training and …
Gradient descent provably optimizes over-parameterized neural networks
One of the mysteries in the success of neural networks is randomly initialized first order
methods like gradient descent can achieve zero training loss even though the objective …
methods like gradient descent can achieve zero training loss even though the objective …
Fast neural kernel embeddings for general activations
Infinite width limit has shed light on generalization and optimization aspects of deep learning
by establishing connections between neural networks and kernel methods. Despite their …
by establishing connections between neural networks and kernel methods. Despite their …
[PDF][PDF] Uncertainty in neural networks: Bayesian ensembling
Understanding the uncertainty of a neural network's (NN) predictions is essential for many
applications. The Bayesian framework provides a principled approach to this, however …
applications. The Bayesian framework provides a principled approach to this, however …
A mathematical theory of relational generalization in transitive inference
Humans and animals routinely infer relations between different items or events and
generalize these relations to novel combinations of items. This allows them to respond …
generalize these relations to novel combinations of items. This allows them to respond …
Expressive priors in Bayesian neural networks: Kernel combinations and periodic functions
A simple, flexible approach to creating expressive priors in Gaussian process (GP) models
makes new kernels from a combination of basic kernels, eg summing a periodic and linear …
makes new kernels from a combination of basic kernels, eg summing a periodic and linear …
Periodic activation functions induce stationarity
Neural network models are known to reinforce hidden data biases, making them unreliable
and difficult to interpret. We seek to build models thatknow what they do not know'by …
and difficult to interpret. We seek to build models thatknow what they do not know'by …
A connection between probability, physics and neural networks
S Ranftl - Physical Sciences Forum, 2022 - mdpi.com
I illustrate an approach that can be exploited for constructing neural networks that a priori
obey physical laws. We start with a simple single-layer neural network (NN) but refrain from …
obey physical laws. We start with a simple single-layer neural network (NN) but refrain from …
Differential training: A generic framework to reduce label noises for android malware detection
A common problem in machine learning-based malware detection is that training data may
contain noisy labels and it is challenging to make the training data noise-free at a large …
contain noisy labels and it is challenging to make the training data noise-free at a large …
Squared neural families: a new class of tractable density models
R Tsuchida, CS Ong… - Advances in neural …, 2024 - proceedings.neurips.cc
Flexible models for probability distributions are an essential ingredient in many machine
learning tasks. We develop and investigate a new class of probability distributions, which we …
learning tasks. We develop and investigate a new class of probability distributions, which we …