Deep learning enabled inverse design in nanophotonics

S So, T Badloe, J Noh, J Bravo-Abad, J Rho - Nanophotonics, 2020 - degruyter.com
Deep learning has become the dominant approach in artificial intelligence to solve complex
data-driven problems. Originally applied almost exclusively in computer-science areas such …

Open problems and fundamental limitations of reinforcement learning from human feedback

S Casper, X Davies, C Shi, TK Gilbert… - arxiv preprint arxiv …, 2023 - arxiv.org
Reinforcement learning from human feedback (RLHF) is a technique for training AI systems
to align with human goals. RLHF has emerged as the central method used to finetune state …

A simple baseline for bayesian uncertainty in deep learning

WJ Maddox, P Izmailov, T Garipov… - Advances in neural …, 2019 - proceedings.neurips.cc
Abstract We propose SWA-Gaussian (SWAG), a simple, scalable, and general purpose
approach for uncertainty representation and calibration in deep learning. Stochastic Weight …

Transformers in reinforcement learning: a survey

P Agarwal, AA Rahman, PL St-Charles… - arxiv preprint arxiv …, 2023 - arxiv.org
Transformers have significantly impacted domains like natural language processing,
computer vision, and robotics, where they improve performance compared to other neural …

Deterministic policy gradient algorithms

D Silver, G Lever, N Heess, T Degris… - International …, 2014 - proceedings.mlr.press
In this paper we consider deterministic policy gradient algorithms for reinforcement learning
with continuous actions. The deterministic policy gradient has a particularly appealing form …

When do flat minima optimizers work?

J Kaddour, L Liu, R Silva… - Advances in Neural …, 2022 - proceedings.neurips.cc
Recently, flat-minima optimizers, which seek to find parameters in low-loss neighborhoods,
have been shown to improve a neural network's generalization performance over stochastic …

Erdos goes neural: an unsupervised learning framework for combinatorial optimization on graphs

N Karalias, A Loukas - Advances in Neural Information …, 2020 - proceedings.neurips.cc
Combinatorial optimization (CO) problems are notoriously challenging for neural networks,
especially in the absence of labeled instances. This work proposes an unsupervised …

A deep reinforcement-learning approach for inverse kinematics solution of a high degree of freedom robotic manipulator

A Malik, Y Lischuk, T Henderson, R Prazenica - Robotics, 2022 - mdpi.com
The foundation and emphasis of robotic manipulator control is Inverse Kinematics (IK). Due
to the complexity of derivation, difficulty of computation, and redundancy, traditional IK …

Warp: On the benefits of weight averaged rewarded policies

A Ramé, J Ferret, N Vieillard, R Dadashi… - arxiv preprint arxiv …, 2024 - arxiv.org
Reinforcement learning from human feedback (RLHF) aligns large language models (LLMs)
by encouraging their generations to have high rewards, using a reward model trained on …

Policy optimization in a noisy neighborhood: On return landscapes in continuous control

N Rahn, P D'Oro, H Wiltzer… - Advances in Neural …, 2024 - proceedings.neurips.cc
Deep reinforcement learning agents for continuous control are known to exhibit significant
instability in their performance over time. In this work, we provide a fresh perspective on …