Google Наука

C Gulcehre, TL Paine, S Srinivasan… - arxiv preprint arxiv …, 2023 - arxiv.org

Reinforcement learning from human feedback (RLHF) can improve the quality of large
language model's (LLM) outputs by aligning them with human preferences. We propose a …

Запазване Позоваване С позовавания в 235 Сродни статии Всички 5 версии Във вид на HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Large language models play starcraft ii: Benchmarks and a chain of summarization approach

W Ma, Q Mi, Y Zeng, X Yan, R Lin… - Advances in …, 2025 - proceedings.neurips.cc

With the continued advancement of Large Language Models (LLMs) Agents in reasoning,
planning, and decision-making, benchmarks have become crucial in evaluating these skills …

Запазване Позоваване С позовавания в 44 Сродни статии Всички 5 версии Във вид на HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Efficient diffusion policies for offline reinforcement learning

B Kang, X Ma, C Du, T Pang… - Advances in Neural …, 2023 - proceedings.neurips.cc

Offline reinforcement learning (RL) aims to learn optimal policies from offline datasets,
where the parameterization of policies is crucial but often overlooked. Recently, Diffsuion-QL …

Запазване Позоваване С позовавания в 58 Сродни статии Всички 5 версии Във вид на HTML

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Plan better amid conservatism: Offline multi-agent reinforcement learning with actor rectification

L Pan, L Huang, T Ma, H Xu - International conference on …, 2022 - proceedings.mlr.press

Conservatism has led to significant progress in offline reinforcement learning (RL) where an
agent learns from pre-collected datasets. However, as many real-world scenarios involve …

Запазване Позоваване С позовавания в 62 Сродни статии Всички 5 версии Във вид на HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Large-scale retrieval for reinforcement learning

P Humphreys, A Guez, O Tieleman… - Advances in …, 2022 - proceedings.neurips.cc

Effective decision making involves flexibly relating past experiences and relevant contextual
information to a novel situation. In deep reinforcement learning (RL), the dominant paradigm …

Запазване Позоваване С позовавания в 26 Сродни статии Всички 8 версии Във вид на HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Hokoff: Real game dataset from honor of kings and its offline reinforcement learning benchmarks

Y Qu, B Wang, J Shao, Y Jiang… - Advances in …, 2023 - proceedings.neurips.cc

Abstract The advancement of Offline Reinforcement Learning (RL) and Offline Multi-Agent
Reinforcement Learning (MARL) critically depends on the availability of high-quality, pre …

Запазване Позоваване С позовавания в 7 Сродни статии Всички 7 версии Във вид на HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

An empirical study of implicit regularization in deep offline rl

C Gulcehre, S Srinivasan, J Sygnowski… - arxiv preprint arxiv …, 2022 - arxiv.org

Deep neural networks are the most commonly used function approximators in offline
reinforcement learning. Prior works have shown that neural nets trained with TD-learning …

Запазване Позоваване С позовавания в 17 Сродни статии Всички 4 версии Във вид на HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Learning to reach goals via diffusion

V Jain, S Ravanbakhsh - arxiv preprint arxiv:2310.02505, 2023 - arxiv.org

We present a novel perspective on goal-conditioned reinforcement learning by framing it
within the context of denoising diffusion models. Analogous to the diffusion process, where …

Запазване Позоваване С позовавания в 5 Сродни статии Всички 6 версии Във вид на HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

A new approach to solving smac task: Generating decision tree code from large language models

Y Deng, W Ma, Y Fan, Y Zhang, H Zhang… - arxiv preprint arxiv …, 2024 - arxiv.org

StarCraft Multi-Agent Challenge (SMAC) is one of the most commonly used experimental
environments in multi-agent reinforcement learning (MARL), where the specific task is to …

Запазване Позоваване С позовавания в 1 Сродни статии Всички 2 версии Във вид на HTML

[Free GPT-4]
[DeepSeek]

[PDF] mi-research.net

Guided Proximal Policy Optimization with Structured Action Graph for Complex Decision-making

Y Yang, D **ng, W **a, P Wang - Machine Intelligence Research, 2025 - Springer

Reinforcement learning encounters formidable challenges when tasked with intricate
decision-making scenarios, primarily due to the expansive parameterized action spaces and …

Запазване Позоваване Сродни статии Всички 3 версии

Създаване на сигнал

Позоваване

Разширено търсене

Запазено в „Моята библиотека“

Starcraft ii unplugged: Large scale offline reinforcement learning

Reinforced self-training (rest) for language modeling

Large language models play starcraft ii: Benchmarks and a chain of summarization approach

Efficient diffusion policies for offline reinforcement learning

Plan better amid conservatism: Offline multi-agent reinforcement learning with actor rectification

Large-scale retrieval for reinforcement learning

Hokoff: Real game dataset from honor of kings and its offline reinforcement learning benchmarks

An empirical study of implicit regularization in deep offline rl

Learning to reach goals via diffusion

A new approach to solving smac task: Generating decision tree code from large language models

Guided Proximal Policy Optimization with Structured Action Graph for Complex Decision-making