Google Académico

N Vyas, D Morwani, R Zhao, I Shapira… - arxiv preprint arxiv …, 2024 - arxiv.org

There is growing evidence of the effectiveness of Shampoo, a higher-order preconditioning
method, over Adam in deep learning optimization tasks. However, Shampoo's drawbacks …

Guardar Citar Citado por 13 Artículos relacionados Las 7 versiones Versión en HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Approximated orthogonal projection unit: stabilizing regression network training using natural gradient

S Wang, C Yang, S Lou - arxiv preprint arxiv:2409.15393, 2024 - arxiv.org

Neural networks (NN) are extensively studied in cutting-edge soft sensor models due to their
feature extraction and function approximation capabilities. Current research into network …

Guardar Citar Citado por 1 Artículos relacionados Las 3 versiones Versión en HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Bayesian Online Natural Gradient (BONG)

M Jones, P Chang, K Murphy - arxiv preprint arxiv:2405.19681, 2024 - arxiv.org

We propose a novel approach to sequential Bayesian inference based on variational Bayes.
The key insight is that, in the online setting, we do not need to add the KL term to regularize …

Guardar Citar Citado por 3 Artículos relacionados Las 2 versiones Versión en HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Stein Variational Newton Neural Network Ensembles

K Flöge, MA Moeed, V Fortuin - arxiv preprint arxiv:2411.01887, 2024 - arxiv.org

Deep neural network ensembles are powerful tools for uncertainty quantification, which
have recently been re-interpreted from a Bayesian perspective. However, current methods …

Guardar Citar Artículos relacionados Las 3 versiones Versión en HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

On The Concurrence of Layer-wise Preconditioning Methods and Provable Feature Learning

TT Zhang, B Moniri, A Nagwekar, F Rahman… - arxiv preprint arxiv …, 2025 - arxiv.org

Layer-wise preconditioning methods are a family of memory-efficient optimization algorithms
that introduce preconditioners per axis of each layer's weight tensors. These methods have …

Guardar Citar Artículos relacionados Versión en HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

AdaFisher: Adaptive Second Order Optimization via Fisher Information

DM Gomes, Y Zhang, E Belilovsky, G Wolf… - arxiv preprint arxiv …, 2024 - arxiv.org

First-order optimization methods are currently the mainstream in training deep neural
networks (DNNs). Optimizers like Adam incorporate limited curvature information by …

Guardar Citar Artículos relacionados Versión en HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Position: Curvature Matrices Should Be Democratized via Linear Operators

F Dangel, R Eschenhagen, W Ormaniec… - arxiv preprint arxiv …, 2025 - arxiv.org

Structured large matrices are prevalent in machine learning. A particularly important class is
curvature matrices like the Hessian, which are central to understanding the loss landscape …

Guardar Citar Artículos relacionados Las 2 versiones Versión en HTML

[Free GPT-4]
[DeepSeek]

[PDF] openreview.net

Fast Fractional Natural Gradient Descent using Learnable Spectral Factorizations

W Lin, F Dangel, R Eschenhagen, J Bae, RE Turner… - openreview.net

Many popular optimization methods can be united through fractional natural gradient
descent (FNGD), which pre-conditions the gradient with a fractional power of the inverse …

Guardar Citar Artículos relacionados Versión en HTML

Crear alerta

Citar

Búsqueda avanzada

Guardado en Mi biblioteca

Can we remove the square-root in adaptive gradient methods? a second-order perspective

Soap: Improving and stabilizing shampoo using adam

Approximated orthogonal projection unit: stabilizing regression network training using natural gradient

Bayesian Online Natural Gradient (BONG)

Stein Variational Newton Neural Network Ensembles

On The Concurrence of Layer-wise Preconditioning Methods and Provable Feature Learning

AdaFisher: Adaptive Second Order Optimization via Fisher Information

Position: Curvature Matrices Should Be Democratized via Linear Operators

Fast Fractional Natural Gradient Descent using Learnable Spectral Factorizations