Quantifying the impact of label noise on federated learning
Federated Learning (FL) is a distributed machine learning paradigm where clients
collaboratively train a model using their local (human-generated) datasets. While existing …
collaboratively train a model using their local (human-generated) datasets. While existing …
The representation theory of neural networks
In this work, we show that neural networks can be represented via the mathematical theory
of quiver representations. More specifically, we prove that a neural network is a quiver …
of quiver representations. More specifically, we prove that a neural network is a quiver …
-SGD: Optimizing ReLU Neural Networks in its Positively Scale-Invariant Space
It is well known that neural networks with rectified linear units (ReLU) activation functions are
positively scale-invariant. Conventional algorithms like stochastic gradient descent optimize …
positively scale-invariant. Conventional algorithms like stochastic gradient descent optimize …
A path-norm toolkit for modern networks: consequences, promises and challenges
This work introduces the first toolkit around path-norms that is fully able to encompass
general DAG ReLU networks with biases, skip connections and any operation based on the …
general DAG ReLU networks with biases, skip connections and any operation based on the …
Positively scale-invariant flatness of relu neural networks
It was empirically confirmed by Keskar et al.\cite {SharpMinima} that flatter minima
generalize better. However, for the popular ReLU network, sharp minimum can also …
generalize better. However, for the popular ReLU network, sharp minimum can also …
A priori estimates of the population risk for residual networks
C Ma, Q Wang - arxiv preprint arxiv:1903.02154, 2019 - arxiv.org
Optimal a priori estimates are derived for the population risk, also known as the
generalization error, of a regularized residual network model. An important part of the …
generalization error, of a regularized residual network model. An important part of the …
ReLU soothes the NTK condition number and accelerates optimization for wide neural networks
Rectified linear unit (ReLU), as a non-linear activation function, is well known to improve the
expressivity of neural networks such that any continuous function can be approximated to …
expressivity of neural networks such that any continuous function can be approximated to …