Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
Soap: Improving and stabilizing shampoo using adam
There is growing evidence of the effectiveness of Shampoo, a higher-order preconditioning
method, over Adam in deep learning optimization tasks. However, Shampoo's drawbacks …
method, over Adam in deep learning optimization tasks. However, Shampoo's drawbacks …
Approximated orthogonal projection unit: stabilizing regression network training using natural gradient
Neural networks (NN) are extensively studied in cutting-edge soft sensor models due to their
feature extraction and function approximation capabilities. Current research into network …
feature extraction and function approximation capabilities. Current research into network …
Bayesian Online Natural Gradient (BONG)
We propose a novel approach to sequential Bayesian inference based on variational Bayes.
The key insight is that, in the online setting, we do not need to add the KL term to regularize …
The key insight is that, in the online setting, we do not need to add the KL term to regularize …
Stein Variational Newton Neural Network Ensembles
K Flöge, MA Moeed, V Fortuin - arxiv preprint arxiv:2411.01887, 2024 - arxiv.org
Deep neural network ensembles are powerful tools for uncertainty quantification, which
have recently been re-interpreted from a Bayesian perspective. However, current methods …
have recently been re-interpreted from a Bayesian perspective. However, current methods …
On The Concurrence of Layer-wise Preconditioning Methods and Provable Feature Learning
Layer-wise preconditioning methods are a family of memory-efficient optimization algorithms
that introduce preconditioners per axis of each layer's weight tensors. These methods have …
that introduce preconditioners per axis of each layer's weight tensors. These methods have …
AdaFisher: Adaptive Second Order Optimization via Fisher Information
First-order optimization methods are currently the mainstream in training deep neural
networks (DNNs). Optimizers like Adam incorporate limited curvature information by …
networks (DNNs). Optimizers like Adam incorporate limited curvature information by …
Position: Curvature Matrices Should Be Democratized via Linear Operators
Structured large matrices are prevalent in machine learning. A particularly important class is
curvature matrices like the Hessian, which are central to understanding the loss landscape …
curvature matrices like the Hessian, which are central to understanding the loss landscape …
Fast Fractional Natural Gradient Descent using Learnable Spectral Factorizations
Many popular optimization methods can be united through fractional natural gradient
descent (FNGD), which pre-conditions the gradient with a fractional power of the inverse …
descent (FNGD), which pre-conditions the gradient with a fractional power of the inverse …