Deep learning enabled inverse design in nanophotonics
Deep learning has become the dominant approach in artificial intelligence to solve complex
data-driven problems. Originally applied almost exclusively in computer-science areas such …
data-driven problems. Originally applied almost exclusively in computer-science areas such …
Open problems and fundamental limitations of reinforcement learning from human feedback
Reinforcement learning from human feedback (RLHF) is a technique for training AI systems
to align with human goals. RLHF has emerged as the central method used to finetune state …
to align with human goals. RLHF has emerged as the central method used to finetune state …
A simple baseline for bayesian uncertainty in deep learning
Abstract We propose SWA-Gaussian (SWAG), a simple, scalable, and general purpose
approach for uncertainty representation and calibration in deep learning. Stochastic Weight …
approach for uncertainty representation and calibration in deep learning. Stochastic Weight …
Transformers in reinforcement learning: a survey
Transformers have significantly impacted domains like natural language processing,
computer vision, and robotics, where they improve performance compared to other neural …
computer vision, and robotics, where they improve performance compared to other neural …
Deterministic policy gradient algorithms
In this paper we consider deterministic policy gradient algorithms for reinforcement learning
with continuous actions. The deterministic policy gradient has a particularly appealing form …
with continuous actions. The deterministic policy gradient has a particularly appealing form …
When do flat minima optimizers work?
Recently, flat-minima optimizers, which seek to find parameters in low-loss neighborhoods,
have been shown to improve a neural network's generalization performance over stochastic …
have been shown to improve a neural network's generalization performance over stochastic …
Erdos goes neural: an unsupervised learning framework for combinatorial optimization on graphs
Combinatorial optimization (CO) problems are notoriously challenging for neural networks,
especially in the absence of labeled instances. This work proposes an unsupervised …
especially in the absence of labeled instances. This work proposes an unsupervised …
A deep reinforcement-learning approach for inverse kinematics solution of a high degree of freedom robotic manipulator
The foundation and emphasis of robotic manipulator control is Inverse Kinematics (IK). Due
to the complexity of derivation, difficulty of computation, and redundancy, traditional IK …
to the complexity of derivation, difficulty of computation, and redundancy, traditional IK …
Warp: On the benefits of weight averaged rewarded policies
Reinforcement learning from human feedback (RLHF) aligns large language models (LLMs)
by encouraging their generations to have high rewards, using a reward model trained on …
by encouraging their generations to have high rewards, using a reward model trained on …
Policy optimization in a noisy neighborhood: On return landscapes in continuous control
Deep reinforcement learning agents for continuous control are known to exhibit significant
instability in their performance over time. In this work, we provide a fresh perspective on …
instability in their performance over time. In this work, we provide a fresh perspective on …