- Academic Search

F Hellström, G Durisi, B Guedj… - … and Trends® in …, 2025 - nowpublishers.com

A fundamental question in theoretical machine learning is generalization. Over the past
decades, the PAC-Bayesian approach has been established as a flexible framework to …

保存引用被引用数: 29 関連記事全 9 バージョン HTMLバージョン

[Free GPT-4]

[PDF] mlr.press

Sgd with large step sizes learns sparse features

M Andriushchenko, AV Varre… - International …, 2023 - proceedings.mlr.press

We showcase important features of the dynamics of the Stochastic Gradient Descent (SGD)
in the training of neural networks. We present empirical observations that commonly used …

保存引用被引用数: 61 関連記事全 7 バージョン HTMLバージョン

[Free GPT-4]

[PDF] neurips.cc

PAC-Bayes compression bounds so tight that they can explain generalization

S Lotfi, M Finzi, S Kapoor… - Advances in …, 2022 - proceedings.neurips.cc

While there has been progress in develo** non-vacuous generalization bounds for deep
neural networks, these bounds tend to be uninformative about why deep learning works. In …

保存引用被引用数: 51 関連記事全 7 バージョン HTMLバージョン

[Free GPT-4]

[PDF] neurips.cc

When do flat minima optimizers work?

J Kaddour, L Liu, R Silva… - Advances in Neural …, 2022 - proceedings.neurips.cc

Recently, flat-minima optimizers, which seek to find parameters in low-loss neighborhoods,
have been shown to improve a neural network's generalization performance over stochastic …

保存引用被引用数: 81 関連記事全 7 バージョン HTMLバージョン

[Free GPT-4]

[PDF] neurips.cc

(S) GD over Diagonal Linear Networks: Implicit bias, Large Stepsizes and Edge of Stability

M Even, S Pesme, S Gunasekar… - Advances in Neural …, 2023 - proceedings.neurips.cc

In this paper, we investigate the impact of stochasticity and large stepsizes on the implicit
regularisation of gradient descent (GD) and stochastic gradient descent (SGD) over $2 …

保存引用被引用数: 14 関連記事全 3 バージョン HTMLバージョン

[Free GPT-4]

[PDF] thecvf.com

Can neural nets learn the same model twice? investigating reproducibility and double descent from the decision boundary perspective

G Somepalli, L Fowl, A Bansal… - Proceedings of the …, 2022 - openaccess.thecvf.com

We discuss methods for visualizing neural network decision boundaries and decision
regions. We use these visualizations to investigate issues related to reproducibility and …

保存引用被引用数: 75 関連記事全 8 バージョン HTMLバージョン

[Free GPT-4]

[PDF] thecvf.com

Subspace adversarial training

T Li, Y Wu, S Chen, K Fang… - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com

Single-step adversarial training (AT) has received wide attention as it proved to be both
efficient and robust. However, a serious problem of catastrophic overfitting exists, ie, the …

保存引用被引用数: 79 関連記事全 9 バージョン HTMLバージョン

[Free GPT-4]

[PDF] neurips.cc

Stochastic collapse: How gradient noise attracts sgd dynamics towards simpler subnetworks

F Chen, D Kunin, A Yamamura… - Advances in Neural …, 2024 - proceedings.neurips.cc

In this work, we reveal a strong implicit bias of stochastic gradient descent (SGD) that drives
overly expressive networks to much simpler subnetworks, thereby dramatically reducing the …

保存引用被引用数: 25 関連記事全 6 バージョン HTMLバージョン

[Free GPT-4]

[PDF] neurips.cc

Why neural networks find simple solutions: The many regularizers of geometric complexity

B Dherin, M Munn, M Rosca… - Advances in Neural …, 2022 - proceedings.neurips.cc

In many contexts, simpler models are preferable to more complex models and the control of
this model complexity is the goal for many methods in machine learning such as …

保存引用被引用数: 35 関連記事全 7 バージョン HTMLバージョン

[Free GPT-4]

[PDF] arxiv.org

Noise is not the main factor behind the gap between sgd and adam on transformers, but sign descent might be

F Kunstner, J Chen, JW Lavington… - arxiv preprint arxiv …, 2023 - arxiv.org

The success of the Adam optimizer on a wide array of architectures has made it the default
in settings where stochastic gradient descent (SGD) performs poorly. However, our …

保存引用被引用数: 60 関連記事全 3 バージョン HTMLバージョン

アラートを作成

引用

検索オプション

マイライブラリに保存しました

Stochastic training is not necessary for generalization

Generalization bounds: Perspectives from information theory and PAC-Bayes

Sgd with large step sizes learns sparse features

PAC-Bayes compression bounds so tight that they can explain generalization

When do flat minima optimizers work?

(S) GD over Diagonal Linear Networks: Implicit bias, Large Stepsizes and Edge of Stability

Can neural nets learn the same model twice? investigating reproducibility and double descent from the decision boundary perspective

Subspace adversarial training

Stochastic collapse: How gradient noise attracts sgd dynamics towards simpler subnetworks

Why neural networks find simple solutions: The many regularizers of geometric complexity

Noise is not the main factor behind the gap between sgd and adam on transformers, but sign descent might be