- Academic Search

H Wang, Z Qu, Q Zhou, H Zhang, B Luo… - IEEE Internet of …, 2021 - ieeexplore.ieee.org

The ever-growing artificial intelligence (AI) applications have greatly reshaped our world in
many areas, eg, smart home, computer vision, natural language processing, etc. Behind …

Save Cite Cited by 45 Related articles All 2 versions Free GPT-4

[Free GPT-4]

[PDF] neurips.cc

Why are adaptive methods good for attention models?

J Zhang, SP Karimireddy, A Veit… - Advances in …, 2020 - proceedings.neurips.cc

While stochastic gradient descent (SGD) is still the de facto algorithm in deep learning,
adaptive methods like Clipped SGD/Adam have been observed to outperform SGD across …

Save Cite Cited by 270 Related articles All 11 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] aaai.org

Faster adaptive federated learning

X Wu, F Huang, Z Hu, H Huang - … of the AAAI conference on artificial …, 2023 - ojs.aaai.org

Federated learning has attracted increasing attention with the emergence of distributed data.
While extensive federated learning algorithms have been proposed for the non-convex …

Save Cite Cited by 84 Related articles All 5 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] mlr.press

AdaGrad avoids saddle points

K Antonakopoulos, P Mertikopoulos… - International …, 2022 - proceedings.mlr.press

Adaptive first-order methods in optimization have widespread ML applications due to their
ability to adapt to non-convex landscapes. However, their convergence guarantees are …

Save Cite Cited by 21 Related articles All 13 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] wiley.com

Deep equilibrium nets

M Azinovic, L Gaegauf… - International Economic …, 2022 - Wiley Online Library

We introduce deep equilibrium nets (DEQNs)—a deep learning‐based method to compute
approximate functional rational expectations equilibria of economic models featuring a …

Save Cite Cited by 122 Related articles All 9 versions Free GPT-4

[Free GPT-4]

[PDF] openreview.net

Why adam beats sgd for attention models

J Zhang, SP Karimireddy, A Veit, S Kim, SJ Reddi… - 2019 - openreview.net

While stochastic gradient descent (SGD) is still the de facto algorithm in deep learning,
adaptive methods like Adam have been observed to outperform SGD across important tasks …

Save Cite Cited by 88 Related articles View as HTML

[Free GPT-4]

[PDF] mlr.press

Explicit regularization in overparametrized models via noise injection

A Orvieto, A Raj, H Kersting… - … Conference on Artificial …, 2023 - proceedings.mlr.press

Injecting noise within gradient descent has several desirable features, such as smoothing
and regularizing properties. In this paper, we investigate the effects of injecting noise before …

Save Cite Cited by 30 Related articles All 4 versions Free GPT-4 View as HTML

Self-organizing radial basis function neural network using accelerated second-order learning algorithm

HG Han, ML Ma, HY Yang, JF Qiao - Neurocomputing, 2022 - Elsevier

Gradient-based algorithms are commonly used for training radial basis function neural
network (RBFNN). However, it is still difficult to avoid vanishing gradient to improve the …

Save Cite Cited by 34 Related articles All 2 versions Free GPT-4

[Free GPT-4]

[HTML] nih.gov

Calibrating the adaptive learning rate to improve convergence of ADAM

Q Tong, G Liang, J Bi - Neurocomputing, 2022 - Elsevier

Adaptive gradient methods (AGMs) have been widely used to optimize nonconvex problems
in the deep learning area. We identify two aspects of AGMs that can be further improved …

Save Cite Cited by 82 Related articles All 8 versions Free GPT-4

[Free GPT-4]

[PDF] aaai.org

Decentralized riemannian algorithm for nonconvex minimax problems

X Wu, Z Hu, H Huang - Proceedings of the AAAI Conference on Artificial …, 2023 - ojs.aaai.org

The minimax optimization over Riemannian manifolds (possibly nonconvex constraints) has
been actively applied to solve many problems, such as robust dimensionality reduction and …

Save Cite Cited by 16 Related articles All 6 versions Free GPT-4 View as HTML

Create alert

Cite

Advanced search

Saved to My library

Esca** saddle points with adaptive gradient methods

A comprehensive survey on training acceleration for large machine learning models in IoT

Why are adaptive methods good for attention models?

Faster adaptive federated learning

AdaGrad avoids saddle points

Deep equilibrium nets

Why adam beats sgd for attention models

Explicit regularization in overparametrized models via noise injection

Self-organizing radial basis function neural network using accelerated second-order learning algorithm

Calibrating the adaptive learning rate to improve convergence of ADAM

Decentralized riemannian algorithm for nonconvex minimax problems