Google 학술 검색

L Shi, G Li, Y Wei, Y Chen… - Advances in Neural …, 2024 - proceedings.neurips.cc

This paper investigates model robustness in reinforcement learning (RL) via the framework
of distributionally robust Markov decision processes (RMDPs). Despite recent efforts, the …

저장 인용 39회 인용 관련 학술자료 전체 10개의 버전 HTML 버전

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Settling the sample complexity of model-based offline reinforcement learning

G Li, L Shi, Y Chen, Y Chi, Y Wei - The Annals of Statistics, 2024 - projecteuclid.org

Settling the sample complexity of model-based offline reinforcement learning Page 1 The
Annals of Statistics 2024, Vol. 52, No. 1, 233–260 https://doi.org/10.1214/23-AOS2342 © …

저장 인용 91회 인용 관련 학술자료 전체 8개의 버전

[Free GPT-4]
[DeepSeek]

[PDF] ieee.org

The efficacy of pessimism in asynchronous Q-learning

Y Yan, G Li, Y Chen, J Fan - IEEE Transactions on Information …, 2023 - ieeexplore.ieee.org

This paper is concerned with the asynchronous form of Q-learning, which applies a
stochastic approximation scheme to Markovian data samples. Motivated by the recent …

저장 인용 61회 인용 관련 학술자료 전체 8개의 버전

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Settling the sample complexity of online reinforcement learning

Z Zhang, Y Chen, JD Lee… - The Thirty Seventh Annual …, 2024 - proceedings.mlr.press

A central issue lying at the heart of online reinforcement learning (RL) is data efficiency.
While a number of recent works achieved asymptotically minimal regret in online RL, the …

저장 인용 22회 인용 관련 학술자료 전체 3개의 버전 HTML 버전

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

When is agnostic reinforcement learning statistically tractable?

Z Jia, G Li, A Rakhlin, A Sekhari… - Advances in Neural …, 2024 - proceedings.neurips.cc

We study the problem of agnostic PAC reinforcement learning (RL): given a policy class $\Pi
$, how many rounds of interaction with an unknown MDP (with a potentially large state and …

저장 인용 4회 인용 관련 학술자료 전체 6개의 버전 HTML 버전

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Reward-agnostic fine-tuning: Provable statistical benefits of hybrid reinforcement learning

G Li, W Zhan, JD Lee, Y Chi… - Advances in Neural …, 2024 - proceedings.neurips.cc

This paper studies tabular reinforcement learning (RL) in the hybrid setting, which assumes
access to both an offline dataset and online interactions with the unknown environment. A …

저장 인용 15회 인용 관련 학술자료 전체 9개의 버전 HTML 버전

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Optimal treatment allocation for efficient policy evaluation in sequential decision making

T Li, C Shi, J Wang, F Zhou - Advances in Neural …, 2024 - proceedings.neurips.cc

A/B testing is critical for modern technological companies to evaluate the effectiveness of
newly developed products against standard baselines. This paper studies optimal designs …

저장 인용 6회 인용 관련 학술자료 전체 6개의 버전 HTML 버전

[Free GPT-4]
[DeepSeek]

[HTML] sciencedirect.com

[HTML][HTML] Improved exploration–exploitation trade-off through adaptive prioritized experience replay

H Hassani, S Nikan, A Shami - Neurocomputing, 2025 - Elsevier

Experience replay is an indispensable part of deep reinforcement learning algorithms that
enables the agent to revisit and reuse its past and recent experiences to update the network …

저장 인용 1회 인용 관련 학술자료 전체 2개의 버전

[Free GPT-4]
[DeepSeek]

[PDF] openreview.net

Is Inverse Reinforcement Learning Harder than Standard Reinforcement Learning?

L Zhao, M Wang, Y Bai - 2023 - openreview.net

Inverse Reinforcement Learning (IRL)---the problem of learning reward functions from
demonstrations of an\emph {expert policy}---plays a critical role in develo** intelligent …

저장 인용 4회 인용 관련 학술자료 전체 3개의 버전 HTML 버전

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Near-Optimal Reinforcement Learning with Self-Play under Adaptivity Constraints

D Qiao, YX Wang - arxiv preprint arxiv:2402.01111, 2024 - arxiv.org

We study the problem of multi-agent reinforcement learning (MARL) with adaptivity
constraints--a new problem motivated by real-world applications where deployments of new …

저장 인용 1회 인용 관련 학술자료 전체 3개의 버전 HTML 버전

알림 만들기

인용

고급 검색

라이브러리에 저장됨

Minimax-optimal reward-agnostic exploration in reinforcement learning

The curious price of distributional robustness in reinforcement learning with a generative model

Settling the sample complexity of model-based offline reinforcement learning

The efficacy of pessimism in asynchronous Q-learning

Settling the sample complexity of online reinforcement learning

When is agnostic reinforcement learning statistically tractable?

Reward-agnostic fine-tuning: Provable statistical benefits of hybrid reinforcement learning

Optimal treatment allocation for efficient policy evaluation in sequential decision making

[HTML][HTML] Improved exploration–exploitation trade-off through adaptive prioritized experience replay

Is Inverse Reinforcement Learning Harder than Standard Reinforcement Learning?

Near-Optimal Reinforcement Learning with Self-Play under Adaptivity Constraints