Google Acadèmic

B Wang, Y Qu, Y Jiang, J Shao, C Liu, W Yang… - arxiv preprint arxiv …, 2024 - arxiv.org

Conventional state representations in reinforcement learning often omit critical task-related
details, presenting a significant challenge for value networks in establishing accurate …

Desa Cita Citat per 5 Articles relacionats Totes les 8 versions Free GPT-4 DeepSeek Versió HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Doubly mild generalization for offline reinforcement learning

Y Mao, Q Wang, Y Qu, Y Jiang, X Ji - arxiv preprint arxiv:2411.07934, 2024 - arxiv.org

Offline Reinforcement Learning (RL) suffers from the extrapolation error and value
overestimation. From a generalization perspective, this issue can be attributed to the over …

Desa Cita Citat per 2 Articles relacionats Totes les 5 versions Free GPT-4 DeepSeek Versió HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Theoretical investigations and practical enhancements on tail task risk minimization in meta learning

Y Lv, Q Wang, D Liang, Z **e - arxiv preprint arxiv:2410.22788, 2024 - arxiv.org

Meta learning is a promising paradigm in the era of large models and task distributional
robustness has become an indispensable consideration in real-world scenarios. Recent …

Desa Cita Citat per 1 Articles relacionats Totes les 3 versions Free GPT-4 DeepSeek Versió HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Latent Reward: LLM-Empowered Credit Assignment in Episodic Reinforcement Learning

Y Qu, Y Jiang, B Wang, Y Mao, C Wang, C Liu… - arxiv preprint arxiv …, 2024 - arxiv.org

Reinforcement learning (RL) often encounters delayed and sparse feedback in real-world
applications, even with only episodic rewards. Previous approaches have made some …

Desa Cita Citat per 1 Articles relacionats Totes les 3 versions Free GPT-4 DeepSeek Versió HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Offline Fictitious Self-Play for Competitive Games

J Chen, W **e, W Zhang, Y Wen - arxiv preprint arxiv:2403.00841, 2024 - arxiv.org

Offline Reinforcement Learning (RL) has received significant interest due to its ability to
improve policies in previously collected datasets without online interactions. Despite its …

Desa Cita Articles relacionats Totes les 4 versions Free GPT-4 DeepSeek Versió HTML

基于多智能体化学的博弈综述

**艺春，刘泽娇，洪艺天，王继超，王健瑞， **毅，唐漾 - 自动化学报, 2024 - aas.net.cn

多智能体**化学**作为博弈论, 控制论和多智能体学**的交叉研究领域, 是多智能体系统研究中
的前沿方向, 赋予了智能体在动态多维的复杂环境中通过交互和决策完成多样化任务的能力 …

Desa Cita Articles relacionats Totes les 3 versions Free GPT-4 DeepSeek

Enhancing Decision-Making in Offline Reinforcement Learning: Adaptive, Multi-Agent, and Online Perspectives

Y Zhang - 2024 - ses.library.usyd.edu.au

Inspired by the successful application of large models in natural language processing and
computer vision, both the research community and industry have increasingly focused on …

Desa Cita Articles relacionats A la memòria cau

Crea una alerta

Cita

Cerca avançada

S'ha desat a La meva biblioteca

Hokoff: Real game dataset from honor of kings and its offline reinforcement learning benchmarks

LLM-empowered state representation for reinforcement learning

Doubly mild generalization for offline reinforcement learning

Theoretical investigations and practical enhancements on tail task risk minimization in meta learning

Latent Reward: LLM-Empowered Credit Assignment in Episodic Reinforcement Learning

Offline Fictitious Self-Play for Competitive Games

基于多智能体化学的博弈综述

Enhancing Decision-Making in Offline Reinforcement Learning: Adaptive, Multi-Agent, and Online Perspectives