Google 학술 검색

M Uehara, C Shi, N Kallus - arxiv preprint arxiv:2212.06355, 2022 - arxiv.org

Reinforcement learning (RL) is one of the most vibrant research frontiers in machine
learning and has been recently applied to solve a number of challenging problems. In this …

저장 인용 70회 인용 관련 학술자료 전체 2개의 버전 HTML 버전

[Free GPT-4]

[PDF] mlr.press

Principled reinforcement learning with human feedback from pairwise or k-wise comparisons

B Zhu, M Jordan, J Jiao - International Conference on …, 2023 - proceedings.mlr.press

We provide a theoretical framework for Reinforcement Learning with Human Feedback
(RLHF). We show that when the underlying true reward is linear, under both Bradley-Terry …

저장 인용 177회 인용 관련 학술자료 전체 8개의 버전 HTML 버전

[Free GPT-4]

[PDF] neurips.cc

Offline reinforcement learning as one big sequence modeling problem

M Janner, Q Li, S Levine - Advances in neural information …, 2021 - proceedings.neurips.cc

Reinforcement learning (RL) is typically viewed as the problem of estimating single-step
policies (for model-free RL) or single-step models (for model-based RL), leveraging the …

저장 인용 783회 인용 관련 학술자료 전체 9개의 버전 HTML 버전

[Free GPT-4]

[PDF] neurips.cc

Supervised pretraining can learn in-context reinforcement learning

J Lee, A **e, A Pacchiano, Y Chandak… - Advances in …, 2024 - proceedings.neurips.cc

Large transformer models trained on diverse datasets have shown a remarkable ability to
learn in-context, achieving high few-shot performance on tasks they were not explicitly …

저장 인용 62회 인용 관련 학술자료 전체 7개의 버전 HTML 버전

[Free GPT-4]

[PDF] neurips.cc

Cal-ql: Calibrated offline rl pre-training for efficient online fine-tuning

M Nakamoto, S Zhai, A Singh… - Advances in …, 2024 - proceedings.neurips.cc

A compelling use case of offline reinforcement learning (RL) is to obtain a policy initialization
from existing datasets followed by fast online fine-tuning with limited interaction. However …

저장 인용 103회 인용 관련 학술자료 전체 7개의 버전 HTML 버전

[Free GPT-4]

[PDF] neurips.cc

Combo: Conservative offline model-based policy optimization

T Yu, A Kumar, R Rafailov… - Advances in neural …, 2021 - proceedings.neurips.cc

Abstract Model-based reinforcement learning (RL) algorithms, which learn a dynamics
model from logged experience and perform conservative planning under the learned model …

저장 인용 455회 인용 관련 학술자료 전체 7개의 버전 HTML 버전

[Free GPT-4]

[PDF] neurips.cc

Bellman-consistent pessimism for offline reinforcement learning

T **e, CA Cheng, N Jiang, P Mineiro… - Advances in neural …, 2021 - proceedings.neurips.cc

The use of pessimism, when reasoning about datasets lacking exhaustive exploration has
recently gained prominence in offline reinforcement learning. Despite the robustness it adds …

[Free GPT-4]

[PDF] neurips.cc

Bridging offline reinforcement learning and imitation learning: A tale of pessimism

P Rashidinejad, B Zhu, C Ma, J Jiao… - Advances in Neural …, 2021 - proceedings.neurips.cc

Offline (or batch) reinforcement learning (RL) algorithms seek to learn an optimal policy from
a fixed dataset without active data collection. Based on the composition of the offline dataset …

저장 인용 315회 인용 관련 학술자료 전체 8개의 버전 HTML 버전

[Free GPT-4]

[PDF] mlr.press

Adversarially trained actor critic for offline reinforcement learning

CA Cheng, T **e, N Jiang… - … Conference on Machine …, 2022 - proceedings.mlr.press

Abstract We propose Adversarially Trained Actor Critic (ATAC), a new model-free algorithm
for offline reinforcement learning (RL) under insufficient data coverage, based on the …

저장 인용 149회 인용 관련 학술자료 전체 8개의 버전 HTML 버전

[Free GPT-4]

[PDF] mlr.press

Offline reinforcement learning with realizability and single-policy concentrability

W Zhan, B Huang, A Huang… - … on Learning Theory, 2022 - proceedings.mlr.press

Sample-efficiency guarantees for offline reinforcement learning (RL) often rely on strong
assumptions on both the function classes (eg, Bellman-completeness) and the data …

저장 인용 129회 인용 관련 학술자료 전체 6개의 버전 HTML 버전

알림 만들기

인용

고급 검색

라이브러리에 저장됨

Is pessimism provably efficient for offline rl?

A review of off-policy evaluation in reinforcement learning

Principled reinforcement learning with human feedback from pairwise or k-wise comparisons

Offline reinforcement learning as one big sequence modeling problem

Supervised pretraining can learn in-context reinforcement learning

Cal-ql: Calibrated offline rl pre-training for efficient online fine-tuning

Combo: Conservative offline model-based policy optimization

Bellman-consistent pessimism for offline reinforcement learning

Bridging offline reinforcement learning and imitation learning: A tale of pessimism

Adversarially trained actor critic for offline reinforcement learning

Offline reinforcement learning with realizability and single-policy concentrability