A comprehensive survey on training acceleration for large machine learning models in IoT
The ever-growing artificial intelligence (AI) applications have greatly reshaped our world in
many areas, eg, smart home, computer vision, natural language processing, etc. Behind …
many areas, eg, smart home, computer vision, natural language processing, etc. Behind …
Why are adaptive methods good for attention models?
While stochastic gradient descent (SGD) is still the de facto algorithm in deep learning,
adaptive methods like Clipped SGD/Adam have been observed to outperform SGD across …
adaptive methods like Clipped SGD/Adam have been observed to outperform SGD across …
Faster adaptive federated learning
Federated learning has attracted increasing attention with the emergence of distributed data.
While extensive federated learning algorithms have been proposed for the non-convex …
While extensive federated learning algorithms have been proposed for the non-convex …
AdaGrad avoids saddle points
Adaptive first-order methods in optimization have widespread ML applications due to their
ability to adapt to non-convex landscapes. However, their convergence guarantees are …
ability to adapt to non-convex landscapes. However, their convergence guarantees are …
Deep equilibrium nets
M Azinovic, L Gaegauf… - International Economic …, 2022 - Wiley Online Library
We introduce deep equilibrium nets (DEQNs)—a deep learning‐based method to compute
approximate functional rational expectations equilibria of economic models featuring a …
approximate functional rational expectations equilibria of economic models featuring a …
Why adam beats sgd for attention models
While stochastic gradient descent (SGD) is still the de facto algorithm in deep learning,
adaptive methods like Adam have been observed to outperform SGD across important tasks …
adaptive methods like Adam have been observed to outperform SGD across important tasks …
Explicit regularization in overparametrized models via noise injection
Injecting noise within gradient descent has several desirable features, such as smoothing
and regularizing properties. In this paper, we investigate the effects of injecting noise before …
and regularizing properties. In this paper, we investigate the effects of injecting noise before …
Self-organizing radial basis function neural network using accelerated second-order learning algorithm
HG Han, ML Ma, HY Yang, JF Qiao - Neurocomputing, 2022 - Elsevier
Gradient-based algorithms are commonly used for training radial basis function neural
network (RBFNN). However, it is still difficult to avoid vanishing gradient to improve the …
network (RBFNN). However, it is still difficult to avoid vanishing gradient to improve the …
Calibrating the adaptive learning rate to improve convergence of ADAM
Adaptive gradient methods (AGMs) have been widely used to optimize nonconvex problems
in the deep learning area. We identify two aspects of AGMs that can be further improved …
in the deep learning area. We identify two aspects of AGMs that can be further improved …
Decentralized riemannian algorithm for nonconvex minimax problems
The minimax optimization over Riemannian manifolds (possibly nonconvex constraints) has
been actively applied to solve many problems, such as robust dimensionality reduction and …
been actively applied to solve many problems, such as robust dimensionality reduction and …