Google znalac

L Da, J Turnau, TP Kutralingam, A Velasquez… - arxiv preprint arxiv …, 2025 - arxiv.org

Deep Reinforcement Learning (RL) has been explored and verified to be effective in solving
decision-making tasks in various domains, such as robotics, transportation, recommender …

Spremi Citiraj Srodni članci Prikaži kao HTML

[Free GPT-4]
[DeepSeek]

[PDF] openreview.net

DPM: Dual Preferences-based Multi-Agent Reinforcement Learning

S Kang, Y Lee, M Kim, J Oh, S Chong, SY Yun - 2024 - openreview.net

Preference-based Reinforcement Learning (PbRL), which optimizes reward functions using
preference feedback, is a promising approach for environments where handcrafted reward …

Spremi Citiraj Spominje se 2 puta Srodni članci Svih 2 inačica Prikaži kao HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

VLP: Vision-Language Preference Learning for Embodied Manipulation

R Liu, C Bai, J Lyu, S Sun, Y Du, X Li - arxiv preprint arxiv:2502.11918, 2025 - arxiv.org

Reward engineering is one of the key challenges in Reinforcement Learning (RL).
Preference-based RL effectively addresses this issue by learning from human feedback …

Spremi Citiraj Srodni članci Prikaži kao HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

RAG-Gym: Optimizing Reasoning and Search Agents with Process Supervision

G **ong, Q **, X Wang, Y Fang, H Liu, Y Yang… - arxiv preprint arxiv …, 2025 - arxiv.org

Retrieval-augmented generation (RAG) has shown great potential for knowledge-intensive
tasks, but its traditional architectures rely on static retrieval, limiting their effectiveness for …

Spremi Citiraj Srodni članci Prikaži kao HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Multilinguality in LLM-Designed Reward Functions for Restless Bandits: Effects on Task Performance and Fairness

A Parthasarathy, C Subramanian, G Senrayan… - arxiv preprint arxiv …, 2025 - arxiv.org

Restless Multi-Armed Bandits (RMABs) have been successfully applied to resource
allocation problems in a variety of settings, including public health. With the rapid …

Spremi Citiraj Srodni članci Svih 2 inačica Prikaži kao HTML

Stvori obavijest

Citiraj

Napredno pretraživanje

Spremljeno u Moju knjižnicu

A Large Language Model-Driven Reward Design Framework via Dynamic Feedback for Reinforcement...

A Survey of Sim-to-Real Methods in RL: Progress, Prospects and Challenges with Foundation Models

DPM: Dual Preferences-based Multi-Agent Reinforcement Learning

VLP: Vision-Language Preference Learning for Embodied Manipulation

RAG-Gym: Optimizing Reasoning and Search Agents with Process Supervision

Multilinguality in LLM-Designed Reward Functions for Restless Bandits: Effects on Task Performance and Fairness