- Academic Search

RF Prudencio, MROA Maximo… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org

With the widespread adoption of deep learning, reinforcement learning (RL) has
experienced a dramatic increase in popularity, scaling to previously intractable problems …

保存引用被引用数: 342 関連記事全 9 バージョン

[Free GPT-4]

[PDF] arxiv.org

Pessimistic value iteration for multi-task data sharing in Offline Reinforcement Learning

C Bai, L Wang, J Hao, Z Yang, B Zhao, Z Wang, X Li - Artificial Intelligence, 2024 - Elsevier

Abstract Offline Reinforcement Learning (RL) has shown promising results in learning a task-
specific policy from a fixed dataset. However, successful offline RL often relies heavily on the …

保存引用被引用数: 7 関連記事全 5 バージョン

[Free GPT-4]

[PDF] arxiv.org

Robust quadrupedal locomotion via risk-averse policy learning

J Shi, C Bai, H He, L Han, D Wang… - … on Robotics and …, 2024 - ieeexplore.ieee.org

The robustness of legged locomotion is crucial for quadrupedal robots in challenging
terrains. Recently, Reinforcement Learning (RL) has shown promising results in legged …

保存引用被引用数: 10 関連記事全 2 バージョン

[Free GPT-4]

[PDF] aaai.org

Ovd-explorer: Optimism should not be the sole pursuit of exploration in noisy environments

J Liu, Z Wang, Y Zheng, J Hao, C Bai, J Ye… - Proceedings of the …, 2024 - ojs.aaai.org

In reinforcement learning, the optimism in the face of uncertainty (OFU) is a mainstream
principle for directing exploration towards less explored areas, characterized by higher …

保存引用被引用数: 6 関連記事全 3 バージョン HTMLバージョン

Mild policy evaluation for offline actor–critic

L Huang, B Dong, J Lu, W Zhang - IEEE Transactions on …, 2023 - ieeexplore.ieee.org

In offline actor–critic (AC) algorithms, the distributional shift between the training data and
target policy causes optimistic value estimates for out-of-distribution (OOD) actions. This …

保存引用被引用数: 6 関連記事全 3 バージョン

[Free GPT-4]

[PDF] arxiv.org

ACL-QL: Adaptive Conservative Level in -Learning for Offline Reinforcement Learning

K Wu, Y Zhao, Z Xu, Z Che, C Yin… - … on Neural Networks …, 2024 - ieeexplore.ieee.org

Offline reinforcement learning (RL), which operates solely on static datasets without further
interactions with the environment, provides an appealing alternative to learning a safe and …

保存引用被引用数: 1 関連記事全 3 バージョン

[Free GPT-4]

[PDF] hal.science

SinKD: Sinkhorn Distance Minimization for Knowledge Distillation

X Cui, Y Qin, Y Gao, E Zhang, Z Xu… - … on Neural Networks …, 2024 - ieeexplore.ieee.org

Knowledge distillation (KD) has been widely adopted to compress large language models
(LLMs). Existing KD methods investigate various divergence measures including the …

保存引用被引用数: 1 関連記事全 2 バージョン

Motion planner with fixed-horizon constrained reinforcement learning for complex autonomous driving scenarios

K Lin, Y Li, S Chen, D Li, X Wu - IEEE Transactions on …, 2023 - ieeexplore.ieee.org

In autonomous driving, behavioral decision-making and trajectory planning remain huge
challenges due to the large amount of uncertainty in environments and complex interaction …

保存引用被引用数: 8 関連記事

Outlier-adaptive-based non-crossing quantiles method for day-ahead electricity price forecasting

Z Chen, B Zhang, C Du, C Yang, W Gui - Applied Energy, 2025 - Elsevier

In deregulated electricity markets, accurate and reliable day-ahead electricity price
forecasting (EPF) is beneficial for hedging volatility risks, implementing dispatch controls …

保存引用関連記事

A Cybersecure Distribution-Free Learning Model for Interval Forecasting of Power Load Under Cyberattacks

Z Chen, C Du, B Zhang, C Yang… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org

Reliable interval forecasting of uncertain load with cyberattacks is critical for grid resilience
and decision-making. Nevertheless, various constraints on the predictive distribution …

保存引用関連記事

アラートを作成

引用

検索オプション

マイライブラリに保存しました

Monotonic quantile network for worst-case offline reinforcement learning

A survey on offline reinforcement learning: Taxonomy, review, and open problems

Pessimistic value iteration for multi-task data sharing in Offline Reinforcement Learning

Robust quadrupedal locomotion via risk-averse policy learning

Ovd-explorer: Optimism should not be the sole pursuit of exploration in noisy environments

Mild policy evaluation for offline actor–critic

ACL-QL: Adaptive Conservative Level in -Learning for Offline Reinforcement Learning

SinKD: Sinkhorn Distance Minimization for Knowledge Distillation

Motion planner with fixed-horizon constrained reinforcement learning for complex autonomous driving scenarios

Outlier-adaptive-based non-crossing quantiles method for day-ahead electricity price forecasting

A Cybersecure Distribution-Free Learning Model for Interval Forecasting of Power Load Under Cyberattacks