„Google“ mokslinčius

S Gu, L Yang, Y Du, G Chen, F Walter, J Wang… - ar** multi-robot control methods from the perspective …

Išsaugoti Cituoti Cituoja 84 Susiję straipsniai Visos 7 versijos

Constrained update projection approach to safe policy optimization

L Yang, J Ji, J Dai, L Zhang, B Zhou… - Advances in …, 2022 - proceedings.neurips.cc

Safe reinforcement learning (RL) studies problems where an intelligent agent has to not only
maximize reward but also avoid exploring unsafe areas. In this study, we propose CUP, a …

Išsaugoti Cituoti Cituoja 58 Susiję straipsniai Visos 10 versijos HTML kopija

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

A general sample complexity analysis of vanilla policy gradient

R Yuan, RM Gower, A Lazaric - International Conference on …, 2022 - proceedings.mlr.press

We adapt recent tools developed for the analysis of Stochastic Gradient Descent (SGD) in
non-convex optimization to obtain convergence and sample complexity guarantees for the …

Išsaugoti Cituoti Cituoja 74 Susiję straipsniai Visos 10 versijos HTML kopija

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Sample efficient policy gradient methods with recursive variance reduction

P Xu, F Gao, Q Gu - arxiv preprint arxiv:1909.08610, 2019 - arxiv.org

Improving the sample efficiency in reinforcement learning has been a long-standing
research problem. In this work, we aim to reduce the sample complexity of existing policy …

Išsaugoti Cituoti Cituoja 111 Susiję straipsniai Visos 7 versijos HTML kopija

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

A novel framework for policy mirror descent with general parameterization and linear convergence

C Alfano, R Yuan, P Rebeschini - Advances in Neural …, 2023 - proceedings.neurips.cc

Modern policy optimization methods in reinforcement learning, such as TRPO and PPO, owe
their success to the use of parameterized policies. However, while theoretical guarantees …

Išsaugoti Cituoti Cituoja 23 Susiję straipsniai Visos 10 versijos HTML kopija

[Free GPT-4]
[DeepSeek]

[PDF] aaai.org

Off-policy proximal policy optimization

W Meng, Q Zheng, G Pan, Y Yin - … of the AAAI Conference on Artificial …, 2023 - ojs.aaai.org

Abstract Proximal Policy Optimization (PPO) is an important reinforcement learning method,
which has achieved great success in sequential decision-making problems. However, PPO …

Išsaugoti Cituoti Cituoja 14 Susiję straipsniai Visos 2 versijos HTML kopija

Wastewater treatment monitoring: Fault detection in sensors using transductive learning and improved reinforcement learning

J Yang, K Tian, H Zhao, Z Feng, S Bourouis… - Expert Systems with …, 2025 - Elsevier

Wastewater treatment plants (WWTPs) increasingly utilize sensors to optimize operations
and ensure treated water quality. These sensors' rich datasets are well-suited for automated …

Išsaugoti Cituoti Cituoja 2 Susiję straipsniai

[Free GPT-4]
[DeepSeek]

[PDF] ieee.org

Stock market prediction with transductive long short-term memory and social media sentiment analysis

A Peivandizadeh, S Hatami, A Nakhjavani… - IEEE …, 2024 - ieeexplore.ieee.org

In an era dominated by digital communication, the vast amounts of data generated from
social media and financial markets present unique opportunities and challenges for …

Išsaugoti Cituoti Cituoja 6 Susiję straipsniai Visos 4 versijos

Seismonet: A proximal policy optimization-based earthquake early warning system using dilated convolution layers and online data augmentation

S Banar, R Mohammadi - Expert Systems with Applications, 2024 - Elsevier

Abstract In seismic safety, Earthquake Early Warning (EEW) systems are indispensable for
mitigating earthquake hazards. These systems strive to quickly evaluate earthquake …

Išsaugoti Cituoti Cituoja 2 Susiję straipsniai

Kurti įspėjimą

Cituoti

Išplėstinė paieška

Išsaugota skiltyje „Mano biblioteka“

Policy optimization with stochastic mirror descent

A review of safe reinforcement learning: Methods, theory and applications

Constrained update projection approach to safe policy optimization

A general sample complexity analysis of vanilla policy gradient

Sample efficient policy gradient methods with recursive variance reduction

A novel framework for policy mirror descent with general parameterization and linear convergence

Off-policy proximal policy optimization

Wastewater treatment monitoring: Fault detection in sensors using transductive learning and improved reinforcement learning

Stock market prediction with transductive long short-term memory and social media sentiment analysis

Seismonet: A proximal policy optimization-based earthquake early warning system using dilated convolution layers and online data augmentation