- Academic Search

B Kang, X Ma, C Du, T Pang… - Advances in Neural …, 2024 - proceedings.neurips.cc

Offline reinforcement learning (RL) aims to learn optimal policies from offline datasets,
where the parameterization of policies is crucial but often overlooked. Recently, Diffsuion-QL …

Enregistrer Citer Cité 48 fois Autres articles Les 5 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] neurips.cc

Understanding, predicting and better resolving Q-value divergence in offline-RL

Y Yue, R Lu, B Kang, S Song… - Advances in Neural …, 2024 - proceedings.neurips.cc

The divergence of the Q-value estimation has been a prominent issue offline reinforcement
learning (offline RL), where the agent has no access to real dynamics. Traditional beliefs …

Enregistrer Citer Cité 11 fois Autres articles Les 5 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] arxiv.org

Preferred-Action-Optimized Diffusion Policies for Offline Reinforcement Learning

T Zhang, J Guan, L Zhao, Y Li, D Li, Z Zeng… - arxiv preprint arxiv …, 2024 - arxiv.org

Offline reinforcement learning (RL) aims to learn optimal policies from previously collected
datasets. Recently, due to their powerful representational capabilities, diffusion models have …

Enregistrer Citer Autres articles Les 2 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] arxiv.org

Exclusively Penalized Q-learning for Offline Reinforcement Learning

J Yeom, Y Jo, J Kim, S Lee, S Han - arxiv preprint arxiv:2405.14082, 2024 - arxiv.org

Constraint-based offline reinforcement learning (RL) involves policy constraints or imposing
penalties on the value function to mitigate overestimation errors caused by distributional …

Enregistrer Citer Cité 1 fois Autres articles Les 2 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] arxiv.org

Out-of-Distribution Adaptation in Offline RL: Counterfactual Reasoning via Causal Normalizing Flows

M Cho, JP How, C Sun - arxiv preprint arxiv:2405.03892, 2024 - arxiv.org

Despite notable successes of Reinforcement Learning (RL), the prevalent use of an online
learning paradigm prevents its widespread adoption, especially in hazardous or costly …

Enregistrer Citer Cité 2 fois Autres articles Les 2 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] arxiv.org

UDQL: Bridging The Gap between MSE Loss and The Optimal Value Function in Offline Reinforcement Learning

Y Zhang, R Yu, Z Yao, W Zhang, J Wang… - arxiv preprint arxiv …, 2024 - arxiv.org

The Mean Square Error (MSE) is commonly utilized to estimate the solution of the optimal
value function in the vast majority of offline reinforcement learning (RL) models and has …

Enregistrer Citer Autres articles Les 2 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] arxiv.org

Many of Your DPOs are Secretly One: Attempting Unification Through Mutual Information

R Tutnov, A Grosnit, H Bou-Ammar - arxiv preprint arxiv:2501.01544, 2025 - arxiv.org

Post-alignment of large language models (LLMs) is critical in improving their utility, safety,
and alignment with human intentions. Direct preference optimisation (DPO) has become one …

Enregistrer Citer Autres articles Version HTML

[Free GPT-4]

[PDF] openreview.net

A Collaborative Perspective on Exploration in Reinforcement Learning

Y Fu, H Zhang, D Wu, W Xu, B Boulet - openreview.net

Exploration is one of the central topic in reinforcement learning (RL). Many existing
approaches take a single agent perspective when tackling this problem. In this work, we …

Enregistrer Citer Autres articles Version HTML

Créer l'alerte

Citer

Recherche avancée

Enregistré dans Ma bibliothèque

Mutual information regularized offline reinforcement learning

Efficient diffusion policies for offline reinforcement learning

Understanding, predicting and better resolving Q-value divergence in offline-RL

Preferred-Action-Optimized Diffusion Policies for Offline Reinforcement Learning

Exclusively Penalized Q-learning for Offline Reinforcement Learning

Out-of-Distribution Adaptation in Offline RL: Counterfactual Reasoning via Causal Normalizing Flows

UDQL: Bridging The Gap between MSE Loss and The Optimal Value Function in Offline Reinforcement Learning

Many of Your DPOs are Secretly One: Attempting Unification Through Mutual Information

A Collaborative Perspective on Exploration in Reinforcement Learning