Randomized ensembled double q-learning: Learning fast without a model

X Chen, C Wang, Z Zhou, K Ross - arxiv preprint arxiv:2101.05982, 2021 - arxiv.org
Using a high Update-To-Data (UTD) ratio, model-based methods have recently achieved
much higher sample efficiency than previous model-free methods for continuous-action DRL …

Deep generalized schrödinger bridge

GH Liu, T Chen, O So… - Advances in Neural …, 2022 - proceedings.neurips.cc
Abstract Mean-Field Game (MFG) serves as a crucial mathematical framework in modeling
the collective behavior of individual agents interacting stochastically with a large population …

Reinforcement learning: theory and applications in hems

O Al-Ani, S Das - Energies, 2022 - mdpi.com
The steep rise in reinforcement learning (RL) in various applications in energy as well as the
penetration of home automation in recent years are the motivation for this article. It surveys …

Vrl3: A data-driven framework for visual deep reinforcement learning

C Wang, X Luo, K Ross, D Li - Advances in Neural …, 2022 - proceedings.neurips.cc
We propose VRL3, a powerful data-driven framework with a simple design for solving
challenging visual deep reinforcement learning (DRL) tasks. We analyze a number of major …

[HTML][HTML] Human-aligned trading by imitative multi-loss reinforcement learning

ZJ Ye, BW Schuller - Expert Systems with Applications, 2023 - Elsevier
Research into algorithmic trading using reinforcement learning has been garnering
increasing popularity in recent years. While most research work focuses on solving a certain …

Deep deterministic policy gradient with compatible critic network

D Wang, M Hu - IEEE Transactions on Neural Networks and …, 2021 - ieeexplore.ieee.org
Deep deterministic policy gradient (DDPG) is a powerful reinforcement learning algorithm for
large-scale continuous controls. DDPG runs the back-propagation from the state-action …

Rate-splitting for intelligent reflecting surface-aided multiuser VR streaming

R Huang, VWS Wong, R Schober - IEEE Journal on Selected …, 2023 - ieeexplore.ieee.org
The growing demand for virtual reality (VR) applications requires wireless systems to
provide a high transmission rate to support 360-degree video streaming to multiple users …

Decision support through deep reinforcement learning for maximizing a courier's monetary gain in a meal delivery environment

W Zhou, H Fotouhi, E Miller-Hooks - Decision Support Systems, 2025 - Elsevier
Meal delivery is a fast-growing industry supported by couriers participating in the gig
economy. This paper takes a single courier's perspective and provides decision support for …

Optimal consensus control for multi‐agent systems: Multi‐step policy gradient adaptive dynamic programming method

L Ji, K Jian, C Zhang, S Yang, X Guo… - IET Control Theory & …, 2023 - Wiley Online Library
This paper presents a novel adaptive dynamic programming (ADP) method to solve the
optimal consensus problem for a class of discrete‐time multi‐agent systems with completely …

Condition State-Based Decision Making in Evolving Systems: Applications in Asset Management and Delivery

W Zhou - 2023 - search.proquest.com
Decision making in stochastic dynamic systems is significantly different from decision
making in deterministic systems in that agents need to make multiple management or …