Convergence of adam under relaxed assumptions

H Li, A Rakhlin, A Jadbabaie - Advances in Neural …, 2023 - proceedings.neurips.cc
In this paper, we provide a rigorous proof of convergence of the Adaptive Moment Estimate
(Adam) algorithm for a wide class of optimization objectives. Despite the popularity and …

A sufficient condition for convergences of adam and rmsprop

F Zou, L Shen, Z Jie, W Zhang… - Proceedings of the IEEE …, 2019 - openaccess.thecvf.com
Adam and RMSProp are two of the most influential adaptive stochastic algorithms for
training deep neural networks, which have been pointed out to be divergent even in the …

Adam can converge without any modification on update rules

Y Zhang, C Chen, N Shi, R Sun… - Advances in neural …, 2022 - proceedings.neurips.cc
Ever since\citet {reddi2019convergence} pointed out the divergence issue of Adam, many
new variants have been designed to obtain convergence. However, vanilla Adam remains …

Why are adaptive methods good for attention models?

J Zhang, SP Karimireddy, A Veit… - Advances in …, 2020 - proceedings.neurips.cc
While stochastic gradient descent (SGD) is still the de facto algorithm in deep learning,
adaptive methods like Clipped SGD/Adam have been observed to outperform SGD across …

Adaptive learning: a cluster-based literature review (2011-2022)

LO Fadieieva - Educational Technology Quarterly, 2023 - acnsci.org
Adaptive learning is a personalized instruction system that adjusts to the needs,
preferences, and progress of learners. This paper reviews the current and future …

A survey of synthetic data augmentation methods in machine vision

A Mumuni, F Mumuni, NK Gerrar - Machine Intelligence Research, 2024 - Springer
The standard approach to tackling computer vision problems is to train deep convolutional
neural network (CNN) models using large-scale image datasets that are representative of …

Provable adaptivity of adam under non-uniform smoothness

B Wang, Y Zhang, H Zhang, Q Meng, R Sun… - Proceedings of the 30th …, 2024 - dl.acm.org
Adam is widely adopted in practical applications due to its fast convergence. However, its
theoretical analysis is still far from satisfactory. Existing convergence analyses for Adam rely …

Closing the gap between the upper bound and lower bound of Adam's iteration complexity

B Wang, J Fu, H Zhang, N Zheng… - Advances in Neural …, 2024 - proceedings.neurips.cc
Abstract Recently, Arjevani et al.[1] establish a lower bound of iteration complexity for the
first-order optimization under an $ L $-smooth condition and a bounded noise variance …

An accurate GRU-based power time-series prediction approach with selective state updating and stochastic optimization

W Zheng, G Chen - IEEE Transactions on Cybernetics, 2021 - ieeexplore.ieee.org
Accurate power time-series prediction is an important application for building new
industrialized smart cities. The gated recurrent units (GRUs) models have been successfully …

Why adam beats sgd for attention models

J Zhang, SP Karimireddy, A Veit, S Kim, SJ Reddi… - 2019 - openreview.net
While stochastic gradient descent (SGD) is still the de facto algorithm in deep learning,
adaptive methods like Adam have been observed to outperform SGD across important tasks …