- Academic Search

D Abel, A Barreto, B Van Roy… - Advances in …, 2023 - proceedings.neurips.cc

In a standard view of the reinforcement learning problem, an agent's goal is to efficiently
identify a policy that maximizes long-term reward. However, this perspective is based on a …

Tallenna Viittaa Viittausten määrä 72 Aiheeseen liittyviä artikkeleita Kaikki 8 versiota HTML-versio

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Distributionally Robust -Learning

Z Liu, Q Bai, J Blanchet, P Dong, W Xu… - International …, 2022 - proceedings.mlr.press

Reinforcement learning (RL) has demonstrated remarkable achievements in simulated
environments. However, carrying this success to real environments requires the important …

Tallenna Viittaa Viittausten määrä 51 Aiheeseen liittyviä artikkeleita Kaikki 3 versiota HTML-versio

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Fine-tuning language models with advantage-induced policy alignment

B Zhu, H Sharma, FV Frujeri, S Dong, C Zhu… - arxiv preprint arxiv …, 2023 - arxiv.org

Reinforcement learning from human feedback (RLHF) has emerged as a reliable approach
to aligning large language models (LLMs) to human preferences. Among the plethora of …

Tallenna Viittaa Viittausten määrä 37 Aiheeseen liittyviä artikkeleita Kaikki 4 versiota HTML-versio

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Settling the reward hypothesis

M Bowling, JD Martin, D Abel… - … on Machine Learning, 2023 - proceedings.mlr.press

The reward hypothesis posits that," all of what we mean by goals and purposes can be well
thought of as maximization of the expected value of the cumulative sum of a received scalar …

Tallenna Viittaa Viittausten määrä 40 Aiheeseen liittyviä artikkeleita Kaikki 8 versiota HTML-versio

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Reinforcement learning: An overview

K Murphy - arxiv preprint arxiv:2412.05265, 2024 - arxiv.org

This manuscript gives a big-picture, up-to-date overview of the field of (deep) reinforcement
learning and sequential decision making, covering value-based RL, policy-gradient …

Tallenna Viittaa Viittausten määrä 1 Aiheeseen liittyviä artikkeleita Kaikki 2 versiota HTML-versio

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Continual learning as computationally constrained reinforcement learning

S Kumar, H Marklund, A Rao, Y Zhu, HJ Jeon… - arxiv preprint arxiv …, 2023 - arxiv.org

An agent that efficiently accumulates knowledge to develop increasingly sophisticated skills
over a long lifetime could advance the frontier of artificial intelligence capabilities. The …

Tallenna Viittaa Viittausten määrä 19 Aiheeseen liittyviä artikkeleita Kaikki 2 versiota HTML-versio

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Partially Observed Optimal Stochastic Control: Regularity, Optimality, Approximations, and Learning

AD Kara, S Yuksel - arxiv preprint arxiv:2412.06735, 2024 - arxiv.org

In this review/tutorial article, we present recent progress on optimal control of partially
observed Markov Decision Processes (POMDPs). We first present regularity and continuity …

Tallenna Viittaa Viittausten määrä 1 Aiheeseen liittyviä artikkeleita HTML-versio

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Deciding what to model: Value-equivalent sampling for reinforcement learning

D Arumugam, B Van Roy - Advances in neural information …, 2022 - proceedings.neurips.cc

The quintessential model-based reinforcement-learning agent iteratively refines its
estimates or prior beliefs about the true underlying model of the environment. Recent …

Tallenna Viittaa Viittausten määrä 15 Aiheeseen liittyviä artikkeleita Kaikki 7 versiota HTML-versio

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Three dogmas of reinforcement learning

D Abel, MK Ho, A Harutyunyan - arxiv preprint arxiv:2407.10583, 2024 - arxiv.org

Modern reinforcement learning has been conditioned by at least three dogmas. The first is
the environment spotlight, which refers to our tendency to focus on modeling environments …

Tallenna Viittaa Viittausten määrä 3 Aiheeseen liittyviä artikkeleita Kaikki 6 versiota HTML-versio

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Satisficing exploration for deep reinforcement learning

D Arumugam, S Kumar, R Gummadi… - arxiv preprint arxiv …, 2024 - arxiv.org

A default assumption in the design of reinforcement-learning algorithms is that a decision-
making agent always explores to learn optimal behavior. In sufficiently complex …

Tallenna Viittaa Viittausten määrä 2 Aiheeseen liittyviä artikkeleita Kaikki 4 versiota HTML-versio

Luo ilmoitus

Viittaa

Tarkennettu haku

Tallennettu omaan kirjastoon

Simple agent, complex environment: Efficient reinforcement learning with agent states

A definition of continual reinforcement learning

Distributionally Robust -Learning

Fine-tuning language models with advantage-induced policy alignment

Settling the reward hypothesis

Reinforcement learning: An overview

Continual learning as computationally constrained reinforcement learning

Partially Observed Optimal Stochastic Control: Regularity, Optimality, Approximations, and Learning

Deciding what to model: Value-equivalent sampling for reinforcement learning

Three dogmas of reinforcement learning

Satisficing exploration for deep reinforcement learning