الباحث العلمي من Google

D Silver, S Singh, D Precup, RS Sutton - Artificial Intelligence, 2021‏ - Elsevier‏

In this article we hypothesise that intelligence, and its associated abilities, can be
understood as subserving the maximisation of reward. Accordingly, reward is enough to …‏

حفظ اقتباس تم اقتباسها في عدد: 667 مقالات ذات صلة الإصدارات الـ 8كلها

Reinforcement learning for intelligent healthcare applications: A survey‏

A Coronato, M Naeem, G De Pietro… - Artificial intelligence in …, 2020‏ - Elsevier‏

Discovering new treatments and personalizing existing ones is one of the major goals of
modern clinical research. In the last decade, Artificial Intelligence (AI) has enabled the …‏

حفظ اقتباس تم اقتباسها في عدد: 323 مقالات ذات صلة الإصدارات الـ 5كلها

[Free GPT-4]
[DeepSeek]

[PDF] sciencedirect.com

A survey of inverse reinforcement learning: Challenges, methods and progress‏

S Arora, P Doshi - Artificial Intelligence, 2021‏ - Elsevier‏

Inverse reinforcement learning (IRL) is the problem of inferring the reward function of an
agent, given its policy or observed behavior. Analogous to RL, IRL is perceived both as a …‏

حفظ اقتباس تم اقتباسها في عدد: 820 مقالات ذات صلة الإصدارات الـ 6كلها

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Reinforcement learning and control as probabilistic inference: Tutorial and review‏

S Levine - arxiv preprint arxiv:1805.00909, 2018‏ - arxiv.org‏

The framework of reinforcement learning or optimal control provides a mathematical
formalization of intelligent decision making that is powerful and broadly applicable. While …‏

حفظ اقتباس تم اقتباسها في عدد: 749 مقالات ذات صلة الإصدارات الـ 5كلها إصدار HTML‏

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Generative adversarial imitation learning‏

J Ho, S Ermon - Advances in neural information processing …, 2016‏ - proceedings.neurips.cc‏

Consider learning a policy from example expert behavior, without interaction with the expert
or access to a reinforcement signal. One approach is to recover the expert's cost function …‏

حفظ اقتباس تم اقتباسها في عدد: 3908 مقالات ذات صلة الإصدارات الـ 19كلها إصدار HTML‏

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Deep reinforcement learning‏

SE Li - Reinforcement learning for sequential decision and …, 2023‏ - Springer‏

Similar to humans, RL agents use interactive learning to successfully obtain satisfactory
decision strategies. However, in many cases, it is desirable to learn directly from …‏

حفظ اقتباس تم اقتباسها في عدد: 443 مقالات ذات صلة الإصدارات الـ 9كلها

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Sqil: Imitation learning via reinforcement learning with sparse rewards‏

S Reddy, AD Dragan, S Levine - arxiv preprint arxiv:1905.11108, 2019‏ - arxiv.org‏

Learning to imitate expert behavior from demonstrations can be challenging, especially in
environments with high-dimensional, continuous observations and unknown dynamics …‏

حفظ اقتباس تم اقتباسها في عدد: 298 مقالات ذات صلة الإصدارات الـ 4كلها إصدار HTML‏

[Free GPT-4]
[DeepSeek]

[PDF] swarthmore.edu

Chomp: Covariant hamiltonian optimization for motion planning‏

M Zucker, N Ratliff, AD Dragan… - … journal of robotics …, 2013‏ - journals.sagepub.com‏

In this paper, we present CHOMP (covariant Hamiltonian optimization for motion planning),
a method for trajectory optimization invariant to reparametrization. CHOMP uses functional …‏

حفظ اقتباس تم اقتباسها في عدد: 872 مقالات ذات صلة الإصدارات الـ 12كلها

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Learning multi-agent behaviors from distributed and streaming demonstrations‏

S Liu, M Zhu - Advances in Neural Information Processing …, 2023‏ - proceedings.neurips.cc‏

This paper considers the problem of inferring the behaviors of multiple interacting experts by
estimating their reward functions and constraints where the distributed demonstrated …‏

حفظ اقتباس تم اقتباسها في عدد: 25 مقالات ذات صلة الإصدارات الـ 3كلها إصدار HTML‏

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Trajectory forecasts in unknown environments conditioned on grid-based plans‏

N Deo, MM Trivedi - arxiv preprint arxiv:2001.00735, 2020‏ - arxiv.org‏

We address the problem of forecasting pedestrian and vehicle trajectories in unknown
environments, conditioned on their past motion and scene structure. Trajectory forecasting is …‏

حفظ اقتباس تم اقتباسها في عدد: 204 مقالات ذات صلة الإصدارات الـ 3كلها إصدار HTML‏

إنشاء تنبيه

اقتباس

بحث متقدم

تم حفظ المقالة في مكتبتي.

Modeling interaction via the principle of maximum causal entropy

[HTML][HTML] Reward is enough‏

Reinforcement learning for intelligent healthcare applications: A survey‏

A survey of inverse reinforcement learning: Challenges, methods and progress‏

Reinforcement learning and control as probabilistic inference: Tutorial and review‏

Generative adversarial imitation learning‏

Deep reinforcement learning‏

Sqil: Imitation learning via reinforcement learning with sparse rewards‏

Chomp: Covariant hamiltonian optimization for motion planning‏

Learning multi-agent behaviors from distributed and streaming demonstrations‏

Trajectory forecasts in unknown environments conditioned on grid-based plans‏