Adversarial training for high-stakes reliability

D Ziegler, S Nix, L Chan, T Bauman… - Advances in neural …, 2022 - proceedings.neurips.cc
In the future, powerful AI systems may be deployed in high-stakes settings, where a single
failure could be catastrophic. One technique for improving AI safety in high-stakes settings is …

Multi-modal inverse constrained reinforcement learning from a mixture of demonstrations

G Qiao, G Liu, P Poupart, Z Xu - Advances in Neural …, 2023 - proceedings.neurips.cc
Abstract Inverse Constraint Reinforcement Learning (ICRL) aims to recover the underlying
constraints respected by expert agents in a data-driven manner. Existing ICRL algorithms …

Scalable bayesian inverse reinforcement learning

AJ Chan, M van der Schaar - arxiv preprint arxiv:2102.06483, 2021 - arxiv.org
Bayesian inference over the reward presents an ideal solution to the ill-posed nature of the
inverse reinforcement learning problem. Unfortunately current methods generally do not …

Universal off-policy evaluation

Y Chandak, S Niekum, B da Silva… - Advances in …, 2021 - proceedings.neurips.cc
When faced with sequential decision-making problems, it is often useful to be able to predict
what would happen if decisions were made using a new policy. Those predictions must …

Fast bellman updates for wasserstein distributionally robust mdps

Z Yu, L Dai, S Xu, S Gao, CP Ho - Advances in Neural …, 2023 - proceedings.neurips.cc
Markov decision processes (MDPs) often suffer from the sensitivity issue under model
ambiguity. In recent years, robust MDPs have emerged as an effective framework to …

Entropic risk optimization in discounted MDPs

JL Hau, M Petrik… - … Conference on Artificial …, 2023 - proceedings.mlr.press
Abstract Risk-averse Markov Decision Processes (MDPs) have optimal policies that achieve
high returns with low variability, but these MDPs are often difficult to solve. Only a few …

Stap: Sequencing task-agnostic policies

C Agia, T Migimatsu, J Wu… - 2023 IEEE International …, 2023 - ieeexplore.ieee.org
Advances in robotic skill acquisition have made it possible to build general-purpose libraries
of learned skills for downstream manipulation tasks. However, naively executing these skills …

Partially observable task and motion planning with uncertainty and risk awareness

A Curtis, G Matheos, N Gothoskar… - arxiv preprint arxiv …, 2024 - arxiv.org
Integrated task and motion planning (TAMP) has proven to be a valuable approach to
generalizable long-horizon robotic manipulation and navigation problems. However, the …

Aligning human preferences with baseline objectives in reinforcement learning

D Marta, S Holk, C Pek, J Tumova… - 2023 IEEE international …, 2023 - ieeexplore.ieee.org
Practical implementations of deep reinforcement learning (deep RL) have been challenging
due to an amplitude of factors, such as designing reward functions that cover every possible …

Policy gradient bayesian robust optimization for imitation learning

Z Javed, DS Brown, S Sharma, J Zhu… - International …, 2021 - proceedings.mlr.press
The difficulty in specifying rewards for many real-world problems has led to an increased
focus on learning rewards from human feedback, such as demonstrations. However, there …