- Academic Search

O Lockwood, M Si - Proceedings of the AAAI Conference on Artificial …, 2022 - ojs.aaai.org

Uncertainty is ubiquitous in games, both in the agents playing games and often in the games
themselves. Working with uncertainty is therefore an important component of successful …

Salva Cita Citato da 61 Articoli correlati Tutte e 9 le versioni Versione HTML

[Free GPT-4]

[PDF] neurips.cc

Combo: Conservative offline model-based policy optimization

T Yu, A Kumar, R Rafailov… - Advances in neural …, 2021 - proceedings.neurips.cc

Abstract Model-based reinforcement learning (RL) algorithms, which learn a dynamics
model from logged experience and perform conservative planning under the learned model …

Salva Cita Citato da 455 Articoli correlati Tutte e 7 le versioni Versione HTML

[Free GPT-4]

[PDF] arxiv.org

Rvs: What is essential for offline rl via supervised learning?

S Emmons, B Eysenbach, I Kostrikov… - ar** for uncertainty-driven offline reinforcement learning

C Bai, L Wang, Z Yang, Z Deng, A Garg, P Liu… - arxiv preprint arxiv …, 2022 - arxiv.org

Offline Reinforcement Learning (RL) aims to learn policies from previously collected
datasets without exploring the environment. Directly applying off-policy algorithms to offline …

Salva Cita Citato da 161 Articoli correlati Tutte e 5 le versioni Versione HTML

[Free GPT-4]

[PDF] arxiv.org

Reward model ensembles help mitigate overoptimization

T Coste, U Anwar, R Kirk, D Krueger - arxiv preprint arxiv:2310.02743, 2023 - arxiv.org

Reinforcement learning from human feedback (RLHF) is a standard approach for fine-tuning
large language models to follow instructions. As part of this process, learned reward models …

Salva Cita Citato da 81 Articoli correlati Tutte e 4 le versioni Versione HTML

[Free GPT-4]

[PDF] openreview.net

Iterative preference learning from human feedback: Bridging theory and practice for rlhf under kl-constraint

W **ong, H Dong, C Ye, Z Wang, H Zhong… - … on Machine Learning, 2024 - openreview.net

This paper studies the theoretical framework of the alignment process of generative models
with Reinforcement Learning from Human Feedback (RLHF). We consider a standard …

Salva Cita Citato da 75 Articoli correlati Tutte e 3 le versioni Versione HTML

[Free GPT-4]

[PDF] neurips.cc

A policy-guided imitation approach for offline reinforcement learning

H Xu, L Jiang, L Jianxiong… - Advances in Neural …, 2022 - proceedings.neurips.cc

Offline reinforcement learning (RL) methods can generally be categorized into two types: RL-
based and Imitation-based. RL-based methods could in principle enjoy out-of-distribution …

Salva Cita Citato da 68 Articoli correlati Tutte e 7 le versioni Versione HTML

[Free GPT-4]

[PDF] neurips.cc

Conformal prediction for uncertainty-aware planning with diffusion dynamics model

J Sun, Y Jiang, J Qiu, P Nobel… - Advances in …, 2024 - proceedings.neurips.cc

Robotic applications often involve working in environments that are uncertain, dynamic, and
partially observable. Recently, diffusion models have been proposed for learning trajectory …

Salva Cita Citato da 30 Articoli correlati Tutte e 4 le versioni Versione HTML

Crea avviso

Cita

Ricerca avanzata

Salvato in La mia biblioteca

Uncertainty weighted actor-critic for offline reinforcement learning

A review of uncertainty for deep reinforcement learning

Combo: Conservative offline model-based policy optimization

Rvs: What is essential for offline rl via supervised learning?

Reward model ensembles help mitigate overoptimization

Iterative preference learning from human feedback: Bridging theory and practice for rlhf under kl-constraint

A policy-guided imitation approach for offline reinforcement learning

Conformal prediction for uncertainty-aware planning with diffusion dynamics model