Google znalac

B Hambly, R Xu, H Yang - Mathematical Finance, 2023 - Wiley Online Library

The rapid changes in the finance industry due to the increasing amount of data have
revolutionized the techniques on data processing and data analysis and brought new …

Spremi Citiraj Spominje se 219 puta Srodni članci Svih 14 inačica

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

An overview of multi-agent reinforcement learning from game theoretical perspective

Y Yang, J Wang - arxiv preprint arxiv:2011.00583, 2020 - arxiv.org

Following the remarkable success of the AlphaGO series, 2019 was a booming year that
witnessed significant advances in multi-agent reinforcement learning (MARL) techniques …

Spremi Citiraj Spominje se 352 puta Srodni članci Svih 2 inačica Prikaži kao HTML

[Free GPT-4]
[DeepSeek]

[PDF] jmlr.org

On the theory of policy gradient methods: Optimality, approximation, and distribution shift

A Agarwal, SM Kakade, JD Lee, G Mahajan - Journal of Machine Learning …, 2021 - jmlr.org

Policy gradient methods are among the most effective methods in challenging reinforcement
learning problems with large state and/or action spaces. However, little is known about even …

Spremi Citiraj Spominje se 516 puta Srodni članci Svih 13 inačica Prikaži kao HTML

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Optimality and approximation with policy gradient methods in markov decision processes

A Agarwal, SM Kakade, JD Lee… - … on Learning Theory, 2020 - proceedings.mlr.press

Policy gradient (PG) methods are among the most effective methods in challenging
reinforcement learning problems with large state and/or action spaces. However, little is …

Spremi Citiraj Spominje se 400 puta Srodni članci Svih 3 inačica Prikaži kao HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Hybrid rl: Using both offline and online data can make rl efficient

Y Song, Y Zhou, A Sekhari, JA Bagnell… - arxiv preprint arxiv …, 2022 - arxiv.org

We consider a hybrid reinforcement learning setting (Hybrid RL), in which an agent has
access to an offline dataset and the ability to collect experience via real-world online …

Spremi Citiraj Spominje se 93 puta Srodni članci Svih 5 inačica Prikaži kao HTML

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Provably efficient exploration in policy optimization

Q Cai, Z Yang, C **, Z Wang - International Conference on …, 2020 - proceedings.mlr.press

While policy-based reinforcement learning (RL) achieves tremendous successes in practice,
it is significantly less understood in theory, especially compared with value-based RL. In …

Spremi Citiraj Spominje se 324 puta Srodni članci Svih 10 inačica Prikaži kao HTML

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Independent policy gradient for large-scale markov potential games: Sharper rates, function approximation, and game-agnostic convergence

D Ding, CY Wei, K Zhang… - … Conference on Machine …, 2022 - proceedings.mlr.press

We examine global non-asymptotic convergence properties of policy gradient methods for
multi-agent reinforcement learning (RL) problems in Markov potential games (MPGs). To …

Spremi Citiraj Spominje se 87 puta Srodni članci Svih 9 inačica Prikaži kao HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Apple intelligence foundation language models

T Gunter, Z Wang, C Wang, R Pang… - arxiv preprint arxiv …, 2024 - arxiv.org

We present foundation language models developed to power Apple Intelligence features,
including a~ 3 billion parameter model designed to run efficiently on devices and a large …

Spremi Citiraj Spominje se 39 puta Srodni članci Svih 3 inačica Prikaži kao HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Pc-pg: Policy cover directed exploration for provable policy gradient learning

A Agarwal, M Henaff, S Kakade… - Advances in neural …, 2020 - proceedings.neurips.cc

Direct policy gradient methods for reinforcement learning are a successful approach for a
variety of reasons: they are model free, they directly optimize the performance metric of …

Spremi Citiraj Spominje se 145 puta Srodni članci Svih 11 inačica Prikaži kao HTML

Joint optimization of preventive maintenance and production scheduling for multi-state production systems based on reinforcement learning

H Yang, W Li, B Wang - Reliability Engineering & System Safety, 2021 - Elsevier

Preventive maintenance and production scheduling are two important and interactive
activities in production systems. In this work, the integrated optimization problem of …

Spremi Citiraj Spominje se 99 puta Srodni članci Svih 5 inačica

Stvori obavijest

Citiraj

Napredno pretraživanje

Spremljeno u Moju knjižnicu

Politex: Regret bounds for policy iteration using expert prediction

Recent advances in reinforcement learning in finance

An overview of multi-agent reinforcement learning from game theoretical perspective

On the theory of policy gradient methods: Optimality, approximation, and distribution shift

Optimality and approximation with policy gradient methods in markov decision processes

Hybrid rl: Using both offline and online data can make rl efficient

Provably efficient exploration in policy optimization

Independent policy gradient for large-scale markov potential games: Sharper rates, function approximation, and game-agnostic convergence

Apple intelligence foundation language models

Pc-pg: Policy cover directed exploration for provable policy gradient learning

Joint optimization of preventive maintenance and production scheduling for multi-state production systems based on reinforcement learning