Toward a theoretical foundation of policy optimization for learning control policies
Gradient-based methods have been widely used for system design and optimization in
diverse application domains. Recently, there has been a renewed interest in studying …
diverse application domains. Recently, there has been a renewed interest in studying …
Nonconvex optimization meets low-rank matrix factorization: An overview
Substantial progress has been made recently on develo** provably accurate and efficient
algorithms for low-rank matrix factorization via nonconvex optimization. While conventional …
algorithms for low-rank matrix factorization via nonconvex optimization. While conventional …
Edge artificial intelligence for 6G: Vision, enabling technologies, and applications
The thriving of artificial intelligence (AI) applications is driving the further evolution of
wireless networks. It has been envisioned that 6G will be transformative and will …
wireless networks. It has been envisioned that 6G will be transformative and will …
SF-FWA: A Self-Adaptive Fast Fireworks Algorithm for effective large-scale optimization
M Chen, Y Tan - Swarm and Evolutionary Computation, 2023 - Elsevier
Computationally efficient algorithms for large-scale black-box optimization have become
increasingly important in recent years due to the growing complexity of engineering and …
increasingly important in recent years due to the growing complexity of engineering and …
Sophia: A scalable stochastic second-order optimizer for language model pre-training
Given the massive cost of language model pre-training, a non-trivial improvement of the
optimization algorithm would lead to a material reduction on the time and cost of training …
optimization algorithm would lead to a material reduction on the time and cost of training …
Understanding gradient descent on the edge of stability in deep learning
Deep learning experiments by\citet {cohen2021gradient} using deterministic Gradient
Descent (GD) revealed an Edge of Stability (EoS) phase when learning rate (LR) and …
Descent (GD) revealed an Edge of Stability (EoS) phase when learning rate (LR) and …
A novel approach to large-scale dynamically weighted directed network representation
A d ynamically w eighted d irected n etwork (DWDN) is frequently encountered in various big
data-related applications like a terminal interaction pattern analysis system (TIPAS) …
data-related applications like a terminal interaction pattern analysis system (TIPAS) …
Understanding contrastive learning requires incorporating inductive biases
Contrastive learning is a popular form of self-supervised learning that encourages
augmentations (views) of the same input to have more similar representations compared to …
augmentations (views) of the same input to have more similar representations compared to …
Meta-learning with implicit gradients
A core capability of intelligent systems is the ability to quickly learn new tasks by drawing on
prior experience. Gradient (or optimization) based meta-learning has recently emerged as …
prior experience. Gradient (or optimization) based meta-learning has recently emerged as …
Fine-grained analysis of optimization and generalization for overparameterized two-layer neural networks
Recent works have cast some light on the mystery of why deep nets fit any data and
generalize despite being very overparametrized. This paper analyzes training and …
generalize despite being very overparametrized. This paper analyzes training and …