Toward a theoretical foundation of policy optimization for learning control policies

B Hu, K Zhang, N Li, M Mesbahi… - Annual Review of …, 2023 - annualreviews.org
Gradient-based methods have been widely used for system design and optimization in
diverse application domains. Recently, there has been a renewed interest in studying …

Nonconvex optimization meets low-rank matrix factorization: An overview

Y Chi, YM Lu, Y Chen - IEEE Transactions on Signal …, 2019 - ieeexplore.ieee.org
Substantial progress has been made recently on develo** provably accurate and efficient
algorithms for low-rank matrix factorization via nonconvex optimization. While conventional …

Edge artificial intelligence for 6G: Vision, enabling technologies, and applications

KB Letaief, Y Shi, J Lu, J Lu - IEEE Journal on Selected Areas …, 2021 - ieeexplore.ieee.org
The thriving of artificial intelligence (AI) applications is driving the further evolution of
wireless networks. It has been envisioned that 6G will be transformative and will …

SF-FWA: A Self-Adaptive Fast Fireworks Algorithm for effective large-scale optimization

M Chen, Y Tan - Swarm and Evolutionary Computation, 2023 - Elsevier
Computationally efficient algorithms for large-scale black-box optimization have become
increasingly important in recent years due to the growing complexity of engineering and …

Sophia: A scalable stochastic second-order optimizer for language model pre-training

H Liu, Z Li, D Hall, P Liang, T Ma - arxiv preprint arxiv:2305.14342, 2023 - arxiv.org
Given the massive cost of language model pre-training, a non-trivial improvement of the
optimization algorithm would lead to a material reduction on the time and cost of training …

Understanding gradient descent on the edge of stability in deep learning

S Arora, Z Li, A Panigrahi - International Conference on …, 2022 - proceedings.mlr.press
Deep learning experiments by\citet {cohen2021gradient} using deterministic Gradient
Descent (GD) revealed an Edge of Stability (EoS) phase when learning rate (LR) and …

A novel approach to large-scale dynamically weighted directed network representation

X Luo, H Wu, Z Wang, J Wang… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org
A d ynamically w eighted d irected n etwork (DWDN) is frequently encountered in various big
data-related applications like a terminal interaction pattern analysis system (TIPAS) …

Understanding contrastive learning requires incorporating inductive biases

N Saunshi, J Ash, S Goel, D Misra… - International …, 2022 - proceedings.mlr.press
Contrastive learning is a popular form of self-supervised learning that encourages
augmentations (views) of the same input to have more similar representations compared to …

Meta-learning with implicit gradients

A Rajeswaran, C Finn, SM Kakade… - Advances in neural …, 2019 - proceedings.neurips.cc
A core capability of intelligent systems is the ability to quickly learn new tasks by drawing on
prior experience. Gradient (or optimization) based meta-learning has recently emerged as …

Fine-grained analysis of optimization and generalization for overparameterized two-layer neural networks

S Arora, S Du, W Hu, Z Li… - … Conference on Machine …, 2019 - proceedings.mlr.press
Recent works have cast some light on the mystery of why deep nets fit any data and
generalize despite being very overparametrized. This paper analyzes training and …