- Academic Search

B Hambly, R Xu, H Yang - Mathematical Finance, 2023 - Wiley Online Library

The rapid changes in the finance industry due to the increasing amount of data have
revolutionized the techniques on data processing and data analysis and brought new …

Speichern Zitieren Zitiert von: 211 Ähnliche Artikel Alle 13 Versionen

[Free GPT-4]

[PDF] mlr.press

Guarantees for epsilon-greedy reinforcement learning with function approximation

C Dann, Y Mansour, M Mohri… - International …, 2022 - proceedings.mlr.press

Myopic exploration policies such as epsilon-greedy, softmax, or Gaussian noise fail to
explore efficiently in some reinforcement learning tasks and yet, they perform well in many …

Speichern Zitieren Zitiert von: 73 Ähnliche Artikel Alle 6 Versionen HTML-Version

[Free GPT-4]

[PDF] arxiv.org

Motif: Intrinsic motivation from artificial intelligence feedback

M Klissarov, P D'Oro, S Sodhani, R Raileanu… - arxiv preprint arxiv …, 2023 - arxiv.org

Exploring rich environments and evaluating one's actions without prior knowledge is
immensely challenging. In this paper, we propose Motif, a general method to interface such …

Speichern Zitieren Zitiert von: 46 Ähnliche Artikel Alle 6 Versionen HTML-Version

[Free GPT-4]

[PDF] neurips.cc

On the importance of exploration for generalization in reinforcement learning

Y Jiang, JZ Kolter, R Raileanu - Advances in Neural …, 2024 - proceedings.neurips.cc

Existing approaches for improving generalization in deep reinforcement learning (RL) have
mostly focused on representation learning, neglecting RL-specific aspects such as …

Speichern Zitieren Zitiert von: 19 Ähnliche Artikel Alle 5 Versionen HTML-Version

[Free GPT-4]

[PDF] arxiv.org

Reinforcement learning: An overview

K Murphy - arxiv preprint arxiv:2412.05265, 2024 - arxiv.org

This manuscript gives a big-picture, up-to-date overview of the field of (deep) reinforcement
learning and sequential decision making, covering value-based RL, policy-gradient …

Speichern Zitieren Zitiert von: 1 Ähnliche Artikel Alle 2 Versionen HTML-Version

[Free GPT-4]

[PDF] jmlr.org

Temporal abstraction in reinforcement learning with the successor representation

MC Machado, A Barreto, D Precup… - Journal of Machine …, 2023 - jmlr.org

Reasoning at multiple levels of temporal abstraction is one of the key attributes of
intelligence. In reinforcement learning, this is often modeled through temporally extended …

Speichern Zitieren Zitiert von: 57 Ähnliche Artikel Alle 3 Versionen HTML-Version

[Free GPT-4]

[PDF] arxiv.org

Deep laplacian-based options for temporally-extended exploration

M Klissarov, MC Machado - arxiv preprint arxiv:2301.11181, 2023 - arxiv.org

Selecting exploratory actions that generate a rich stream of experience for better learning is
a fundamental challenge in reinforcement learning (RL). An approach to tackle this problem …

Speichern Zitieren Zitiert von: 20 Ähnliche Artikel Alle 6 Versionen HTML-Version

[Free GPT-4]

[PDF] mdpi.com

UAV path planning optimization strategy: Considerations of urban morphology, microclimate, and energy efficiency using Q-learning algorithm

A Souto, R Alfaia, E Cardoso, J Araújo, C Francês - Drones, 2023 - mdpi.com

The use of unmanned aerial vehicles (UAVS) has been suggested as a potential
communications alternative due to their fast implantation, which makes this resource an …

Speichern Zitieren Zitiert von: 22 Ähnliche Artikel Alle 5 Versionen Im Cache

[Free GPT-4]

[PDF] neurips.cc

The phenomenon of policy churn

T Schaul, A Barreto, J Quan… - Advances in Neural …, 2022 - proceedings.neurips.cc

We identify and study the phenomenon of policy churn, that is, the rapid change of the
greedy policy in value-based reinforcement learning. Policy churn operates at a surprisingly …

Speichern Zitieren Zitiert von: 25 Ähnliche Artikel Alle 5 Versionen HTML-Version

[Free GPT-4]

[PDF] mlr.press

Timing as an Action: Learning When to Observe and Act

H Zhou, A Huang, K Azizzadenesheli… - International …, 2024 - proceedings.mlr.press

In standard reinforcement learning setups, the agent receives observations and performs
actions at evenly spaced intervals. However, in many real-world settings, observations are …

Speichern Zitieren Zitiert von: 1 Ähnliche Artikel HTML-Version

Alert erstellen

Zitieren

Erweiterte Suche

In „Meine Bibliothek“ gespeichert

Temporally-extended {\epsilon}-greedy exploration

Recent advances in reinforcement learning in finance

Guarantees for epsilon-greedy reinforcement learning with function approximation

Motif: Intrinsic motivation from artificial intelligence feedback

On the importance of exploration for generalization in reinforcement learning

Reinforcement learning: An overview

Temporal abstraction in reinforcement learning with the successor representation

Deep laplacian-based options for temporally-extended exploration

UAV path planning optimization strategy: Considerations of urban morphology, microclimate, and energy efficiency using Q-learning algorithm

The phenomenon of policy churn

Timing as an Action: Learning When to Observe and Act