Auto-encoders in deep learning—a review with new perspectives

S Chen, W Guo - Mathematics, 2023 - mdpi.com
Deep learning, which is a subfield of machine learning, has opened a new era for the
development of neural networks. The auto-encoder is a key component of deep structure …

Recent advances in stochastic gradient descent in deep learning

Y Tian, Y Zhang, H Zhang - Mathematics, 2023 - mdpi.com
In the age of artificial intelligence, the best approach to handling huge amounts of data is a
tremendously motivating and hard problem. Among machine learning models, stochastic …

Why transformers need adam: A hessian perspective

Y Zhang, C Chen, T Ding, Z Li… - Advances in Neural …, 2025 - proceedings.neurips.cc
SGD performs worse than Adam by a significant margin on Transformers, but the reason
remains unclear. In this work, we provide an explanation through the lens of Hessian:(i) …

MonkeyNet: A robust deep convolutional neural network for monkeypox disease detection and classification

D Bala, MS Hossain, MA Hossain, MI Abdullah… - Neural Networks, 2023 - Elsevier
The monkeypox virus poses a new pandemic threat while we are still recovering from
COVID-19. Despite the fact that monkeypox is not as lethal and contagious as COVID-19 …

Group knowledge transfer: Federated learning of large cnns at the edge

C He, M Annavaram… - Advances in neural …, 2020 - proceedings.neurips.cc
Scaling up the convolutional neural network (CNN) size (eg, width, depth, etc.) is known to
effectively improve model accuracy. However, the large model size impedes training on …

Convergence of adam under relaxed assumptions

H Li, A Rakhlin, A Jadbabaie - Advances in Neural …, 2023 - proceedings.neurips.cc
In this paper, we provide a rigorous proof of convergence of the Adaptive Moment Estimate
(Adam) algorithm for a wide class of optimization objectives. Despite the popularity and …

Facial emotion recognition: State of the art performance on FER2013

Y Khaireddin, Z Chen - arxiv preprint arxiv:2105.03588, 2021 - arxiv.org
Facial emotion recognition (FER) is significant for human-computer interaction such as
clinical practice and behavioral description. Accurate and robust FER by computer models …

[PDF][PDF] Comparison of optimization techniques based on gradient descent algorithm: A review

SH Haji, AM Abdulazeez - PalArch's Journal of Archaeology of …, 2021 - researchgate.net
Whether you deal with a real-life issue or create a software product, optimization is
constantly the ultimate goal. This goal, however, is achieved by utilizing one of the …

Adam can converge without any modification on update rules

Y Zhang, C Chen, N Shi, R Sun… - Advances in neural …, 2022 - proceedings.neurips.cc
Ever since\citet {reddi2019convergence} pointed out the divergence issue of Adam, many
new variants have been designed to obtain convergence. However, vanilla Adam remains …

On empirical comparisons of optimizers for deep learning

D Choi, CJ Shallue, Z Nado, J Lee… - arxiv preprint arxiv …, 2019 - arxiv.org
Selecting an optimizer is a central step in the contemporary deep learning pipeline. In this
paper, we demonstrate the sensitivity of optimizer comparisons to the hyperparameter tuning …