Google 학술 검색

SR Sinclair, S Banerjee, CL Yu - Operations Research, 2023 - pubsonline.informs.org

Discretization-based approaches to solving online reinforcement learning problems are
studied extensively on applications such as resource allocation and cache management …

저장 인용 22회 인용 관련 학술자료 전체 6개의 버전

[Free GPT-4]

[PDF] mlr.press

A kernel-based approach to non-stationary reinforcement learning in metric spaces

OD Domingues, P Ménard, M Pirotta… - International …, 2021 - proceedings.mlr.press

In this work, we propose KeRNS: an algorithm for episodic reinforcement learning in non-
stationary Markov Decision Processes (MDPs) whose state-action set is endowed with a …

저장 인용 43회 인용 관련 학술자료 전체 11개의 버전 HTML 버전

[Free GPT-4]

[PDF] jmlr.org

Q-learning for MDPs with general spaces: Convergence and near optimality via quantization under weak continuity

A Kara, N Saldi, S Yüksel - Journal of Machine Learning Research, 2023 - jmlr.org

Reinforcement learning algorithms often require finiteness of state and action spaces in
Markov decision processes (MDPs)(also called controlled Markov chains) and various …

저장 인용 26회 인용 관련 학술자료 전체 6개의 버전 HTML 버전

[Free GPT-4]

[PDF] acm.org

Overcoming the long horizon barrier for sample-efficient reinforcement learning with latent low-rank structure

T Sam, Y Chen, CL Yu - Proceedings of the ACM on Measurement and …, 2023 - dl.acm.org

The practicality of reinforcement learning algorithms has been limited due to poor scaling
with respect to the problem size, as the sample complexity of learning an ε-optimal policy is …

저장 인용 17회 인용 관련 학술자료 전체 6개의 버전

[Free GPT-4]

[PDF] neurips.cc

Lipschitz bandits with batched feedback

Y Feng, T Wang - Advances in Neural Information …, 2022 - proceedings.neurips.cc

In this paper, we study Lipschitz bandit problems with batched feedback, where the
expected reward is Lipschitz and the reward observations are communicated to the player in …

저장 인용 7회 인용 관련 학술자료 전체 9개의 버전 HTML 버전

[Free GPT-4]

[PDF] ieee.org

Effects of sampling and prediction horizon in reinforcement learning

P Osinenko, D Dobriborsci - IEEE Access, 2021 - ieeexplore.ieee.org

Plain reinforcement learning (RL) may be prone to loss of convergence, constraint violation,
unexpected performance, etc. Commonly, RL agents undergo extensive learning stages to …

저장 인용 7회 인용 관련 학술자료 전체 4개의 버전

[Free GPT-4]

[PDF] arxiv.org

Rich-Observation Reinforcement Learning with Continuous Latent Dynamics

Y Song, L Wu, DJ Foster, A Krishnamurthy - arxiv preprint arxiv …, 2024 - arxiv.org

Sample-efficiency and reliability remain major bottlenecks toward wide adoption of
reinforcement learning algorithms in continuous settings with high-dimensional perceptual …

저장 인용 1회 인용 관련 학술자료 전체 3개의 버전 HTML 버전

[Free GPT-4]

[PDF] arxiv.org

Rethinking the Intermediate Features in Adversarial Attacks: Misleading Robotic Models via Adversarial Distillation

K Zhao, H Huang, M Li, Y Wu - arxiv preprint arxiv:2411.15222, 2024 - arxiv.org

Language-conditioned robotic learning has significantly enhanced robot adaptability by
enabling a single model to execute diverse tasks in response to verbal commands. Despite …

저장 인용 관련 학술자료 전체 2개의 버전 HTML 버전

알림 만들기

인용

고급 검색

라이브러리에 저장됨

Adaptive discretization for model-based reinforcement learning

Ucb momentum q-learning: Correcting the bias without forgetting

Adaptive discretization in online reinforcement learning

A kernel-based approach to non-stationary reinforcement learning in metric spaces

Q-learning for MDPs with general spaces: Convergence and near optimality via quantization under weak continuity

Overcoming the long horizon barrier for sample-efficient reinforcement learning with latent low-rank structure

Lipschitz bandits with batched feedback

Effects of sampling and prediction horizon in reinforcement learning

Rich-Observation Reinforcement Learning with Continuous Latent Dynamics

Rethinking the Intermediate Features in Adversarial Attacks: Misleading Robotic Models via Adversarial Distillation