Google 학술 검색

Y Wang, Q Liu, C ** - Advances in Neural Information …, 2023 - proceedings.neurips.cc

Abstract Reinforcement learning from Human Feedback (RLHF) learns from preference
signals, while standard Reinforcement Learning (RL) directly learns from reward signals …

저장 인용 26회 인용 관련 학술자료 전체 4개의 버전 HTML 버전

[Free GPT-4]

[PDF] arxiv.org

Unified algorithms for rl with decision-estimation coefficients: No-regret, pac, and reward-free learning

F Chen, S Mei, Y Bai - arxiv preprint arxiv:2209.11745, 2022 - arxiv.org

Finding unified complexity measures and algorithms for sample-efficient learning is a central
topic of research in reinforcement learning (RL). The Decision-Estimation Coefficient (DEC) …

저장 인용 38회 인용 관련 학술자료 전체 3개의 버전 HTML 버전

[Free GPT-4]

[PDF] arxiv.org

Making rl with preference-based feedback efficient via randomization

R Wu, W Sun - arxiv preprint arxiv:2310.14554, 2023 - arxiv.org

Reinforcement Learning algorithms that learn from human feedback (RLHF) need to be
efficient in terms of statistical complexity, computational complexity, and query complexity. In …

저장 인용 20회 인용 관련 학술자료 전체 3개의 버전 HTML 버전

[Free GPT-4]

[PDF] mlr.press

Lower bounds for learning in revealing POMDPs

F Chen, H Wang, C **ong, S Mei… - … Conference on Machine …, 2023 - proceedings.mlr.press

This paper studies the fundamental limits of reinforcement learning (RL) in the challenging
partially observable setting. While it is well-established that learning in Partially Observable …

저장 인용 15회 인용 관련 학술자료 전체 6개의 버전 HTML 버전

[Free GPT-4]

[PDF] mlr.press

On the complexity of multi-agent decision making: From learning in games to partial monitoring

D Foster, DJ Foster, N Golowich… - The Thirty Sixth …, 2023 - proceedings.mlr.press

A central problem in the theory of multi-agent reinforcement learning (MARL) is to
understand what structural conditions and algorithmic principles lead to sample-efficient …

저장 인용 15회 인용 관련 학술자료 전체 3개의 버전 HTML 버전

[Free GPT-4]

[PDF] neurips.cc

Posterior sampling for competitive RL: function approximation and partial observation

S Qiu, Z Dai, H Zhong, Z Wang… - Advances in Neural …, 2024 - proceedings.neurips.cc

This paper investigates posterior sampling algorithms for competitive reinforcement learning
(RL) in the context of general function approximations. Focusing on zero-sum Markov games …

저장 인용 3회 인용 관련 학술자료 전체 8개의 버전 HTML 버전

[Free GPT-4]

[PDF] arxiv.org

Partially observable rl with b-stability: Unified structural condition and sharp sample-efficient algorithms

F Chen, Y Bai, S Mei - arxiv preprint arxiv:2209.14990, 2022 - arxiv.org

Partial Observability--where agents can only observe partial information about the true
underlying state of the system--is ubiquitous in real-world applications of Reinforcement …

저장 인용 24회 인용 관련 학술자료 전체 3개의 버전 HTML 버전

[Free GPT-4]

[PDF] aaai.org

Optimistic policy gradient in multi-player markov games with a single controller: Convergence beyond the minty property

I Anagnostides, I Panageas, G Farina… - Proceedings of the AAAI …, 2024 - ojs.aaai.org

Policy gradient methods enjoy strong practical performance in numerous tasks in
reinforcement learning. Their theoretical understanding in multiagent settings, however …

저장 인용 3회 인용 관련 학술자료 전체 3개의 버전 HTML 버전

[Free GPT-4]

[PDF] arxiv.org

Provably efficient ucb-type algorithms for learning predictive state representations

R Huang, Y Liang, J Yang - arxiv preprint arxiv:2307.00405, 2023 - arxiv.org

The general sequential decision-making problem, which includes Markov decision
processes (MDPs) and partially observable MDPs (POMDPs) as special cases, aims at …

저장 인용 5회 인용 관련 학술자료 전체 5개의 버전 HTML 버전

[Free GPT-4]

[PDF] arxiv.org

Less is more: Robust robot learning via partially observable multi-agent reinforcement learning

W Zhao, EA Rantala, J Pajarinen… - arxiv preprint arxiv …, 2023 - arxiv.org

In many multi-agent and high-dimensional robotic tasks, the controller can be designed in
either a centralized or decentralized way. Correspondingly, it is possible to use either single …

저장 인용 6회 인용 관련 학술자료 전체 2개의 버전 HTML 버전

알림 만들기

인용

고급 검색

라이브러리에 저장됨

Sample-efficient reinforcement learning of partially observable markov games

Is rlhf more difficult than standard rl? a theoretical perspective

Unified algorithms for rl with decision-estimation coefficients: No-regret, pac, and reward-free learning

Making rl with preference-based feedback efficient via randomization

Lower bounds for learning in revealing POMDPs

On the complexity of multi-agent decision making: From learning in games to partial monitoring

Posterior sampling for competitive RL: function approximation and partial observation

Partially observable rl with b-stability: Unified structural condition and sharp sample-efficient algorithms

Optimistic policy gradient in multi-player markov games with a single controller: Convergence beyond the minty property

Provably efficient ucb-type algorithms for learning predictive state representations

Less is more: Robust robot learning via partially observable multi-agent reinforcement learning