Google Tudós

D Ziegler, S Nix, L Chan, T Bauman… - Advances in neural …, 2022 - proceedings.neurips.cc

In the future, powerful AI systems may be deployed in high-stakes settings, where a single
failure could be catastrophic. One technique for improving AI safety in high-stakes settings is …

Mentés Hivatkozás Idézetek száma: 55 Kapcsolódó cikkek Mind a(z) 7 változat HTML-változat

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Multi-modal inverse constrained reinforcement learning from a mixture of demonstrations

G Qiao, G Liu, P Poupart, Z Xu - Advances in Neural …, 2023 - proceedings.neurips.cc

Abstract Inverse Constraint Reinforcement Learning (ICRL) aims to recover the underlying
constraints respected by expert agents in a data-driven manner. Existing ICRL algorithms …

Mentés Hivatkozás Idézetek száma: 15 Kapcsolódó cikkek Mind a(z) 6 változat HTML-változat

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Scalable bayesian inverse reinforcement learning

AJ Chan, M van der Schaar - arxiv preprint arxiv:2102.06483, 2021 - arxiv.org

Bayesian inference over the reward presents an ideal solution to the ill-posed nature of the
inverse reinforcement learning problem. Unfortunately current methods generally do not …

Mentés Hivatkozás Idézetek száma: 79 Kapcsolódó cikkek Mind a(z) 5 változat HTML-változat

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Universal off-policy evaluation

Y Chandak, S Niekum, B da Silva… - Advances in …, 2021 - proceedings.neurips.cc

When faced with sequential decision-making problems, it is often useful to be able to predict
what would happen if decisions were made using a new policy. Those predictions must …

Mentés Hivatkozás Idézetek száma: 59 Kapcsolódó cikkek Mind a(z) 11 változat HTML-változat

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Fast bellman updates for wasserstein distributionally robust mdps

Z Yu, L Dai, S Xu, S Gao, CP Ho - Advances in Neural …, 2023 - proceedings.neurips.cc

Markov decision processes (MDPs) often suffer from the sensitivity issue under model
ambiguity. In recent years, robust MDPs have emerged as an effective framework to …

Mentés Hivatkozás Idézetek száma: 11 Kapcsolódó cikkek Mind a(z) 6 változat HTML-változat

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Entropic risk optimization in discounted MDPs

JL Hau, M Petrik… - … Conference on Artificial …, 2023 - proceedings.mlr.press

Abstract Risk-averse Markov Decision Processes (MDPs) have optimal policies that achieve
high returns with low variability, but these MDPs are often difficult to solve. Only a few …

Mentés Hivatkozás Idézetek száma: 18 Kapcsolódó cikkek Mind a(z) 8 változat HTML-változat

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Stap: Sequencing task-agnostic policies

C Agia, T Migimatsu, J Wu… - 2023 IEEE International …, 2023 - ieeexplore.ieee.org

Advances in robotic skill acquisition have made it possible to build general-purpose libraries
of learned skills for downstream manipulation tasks. However, naively executing these skills …

Mentés Hivatkozás Idézetek száma: 23 Kapcsolódó cikkek Mind a(z) 5 változat

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Partially observable task and motion planning with uncertainty and risk awareness

A Curtis, G Matheos, N Gothoskar… - arxiv preprint arxiv …, 2024 - arxiv.org

Integrated task and motion planning (TAMP) has proven to be a valuable approach to
generalizable long-horizon robotic manipulation and navigation problems. However, the …

Mentés Hivatkozás Idézetek száma: 6 Kapcsolódó cikkek Mind a(z) 3 változat HTML-változat

[Free GPT-4]
[DeepSeek]

[PDF] diva-portal.org

Aligning human preferences with baseline objectives in reinforcement learning

D Marta, S Holk, C Pek, J Tumova… - 2023 IEEE international …, 2023 - ieeexplore.ieee.org

Practical implementations of deep reinforcement learning (deep RL) have been challenging
due to an amplitude of factors, such as designing reward functions that cover every possible …

Mentés Hivatkozás Idézetek száma: 13 Kapcsolódó cikkek Mind a(z) 5 változat

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Policy gradient bayesian robust optimization for imitation learning

Z Javed, DS Brown, S Sharma, J Zhu… - International …, 2021 - proceedings.mlr.press

The difficulty in specifying rewards for many real-world problems has led to an increased
focus on learning rewards from human feedback, such as demonstrations. However, there …

Mentés Hivatkozás Idézetek száma: 25 Kapcsolódó cikkek Mind a(z) 8 változat HTML-változat

Értesítés létrehozása

Hivatkozás

Speciális keresés

Mentve a Saját könyvtárba

Bayesian robust optimization for imitation learning

Adversarial training for high-stakes reliability

Multi-modal inverse constrained reinforcement learning from a mixture of demonstrations

Scalable bayesian inverse reinforcement learning

Universal off-policy evaluation

Fast bellman updates for wasserstein distributionally robust mdps

Entropic risk optimization in discounted MDPs

Stap: Sequencing task-agnostic policies

Partially observable task and motion planning with uncertainty and risk awareness

Aligning human preferences with baseline objectives in reinforcement learning

Policy gradient bayesian robust optimization for imitation learning