[HTML][HTML] Reinforcement learning in urban network traffic signal control: A systematic literature review

M Noaeen, A Naik, L Goodman, J Crebo, T Abrar… - Expert Systems with …, 2022 - Elsevier
Improvement of traffic signal control (TSC) efficiency has been found to lead to improved
urban transportation and enhanced quality of life. Recently, the use of reinforcement …

A gentle introduction to reinforcement learning and its application in different fields

M Naeem, STH Rizvi, A Coronato - IEEE access, 2020 - ieeexplore.ieee.org
Due to the recent progress in Deep Neural Networks, Reinforcement Learning (RL) has
become one of the most important and useful technology. It is a learning method where a …

Deep reinforcement learning

SE Li - Reinforcement learning for sequential decision and …, 2023 - Springer
Similar to humans, RL agents use interactive learning to successfully obtain satisfactory
decision strategies. However, in many cases, it is desirable to learn directly from …

Modeling recommender ecosystems: Research challenges at the intersection of mechanism design, reinforcement learning and generative models

C Boutilier, M Mladenov, G Tennenholtz - arxiv preprint arxiv:2309.06375, 2023 - arxiv.org
Modern recommender systems lie at the heart of complex ecosystems that couple the
behavior of users, content providers, advertisers, and other actors. Despite this, the focus of …

Risk-averse offline reinforcement learning

NA Urpí, S Curi, A Krause - arxiv preprint arxiv:2102.05371, 2021 - arxiv.org
Training Reinforcement Learning (RL) agents in high-stakes applications might be too
prohibitive due to the risk associated to exploration. Thus, the agent can only use data …

Attacker-centric view of a detection game against advanced persistent threats

L **ao, D Xu, NB Mandayam… - IEEE transactions on …, 2018 - ieeexplore.ieee.org
Advanced persistent threats (APTs) are a major threat to cyber-security, causing significant
financial and privacy losses each year. In this paper, cumulative prospect theory (CPT) is …

Off-policy risk assessment in contextual bandits

A Huang, L Leqi, Z Lipton… - Advances in Neural …, 2021 - proceedings.neurips.cc
Even when unable to run experiments, practitioners can evaluate prospective policies, using
previously logged data. However, while the bandits literature has adopted a diverse set of …

Stochastic games for the smart grid energy management with prospect prosumers

SR Etesami, W Saad, NB Mandayam… - IEEE Transactions on …, 2018 - ieeexplore.ieee.org
In this paper, the problem of the smart grid energy management under stochastic dynamics
is investigated. In the considered model, at the demand side, it is assumed that customers …

Robust quadrupedal locomotion via risk-averse policy learning

J Shi, C Bai, H He, L Han, D Wang… - … on Robotics and …, 2024 - ieeexplore.ieee.org
The robustness of legged locomotion is crucial for quadrupedal robots in challenging
terrains. Recently, Reinforcement Learning (RL) has shown promising results in legged …

Concentration of risk measures: A Wasserstein distance approach

SP Bhat, P LA - Advances in neural information processing …, 2019 - proceedings.neurips.cc
Known finite-sample concentration bounds for the Wasserstein distance between the
empirical and true distribution of a random variable are used to derive a two-sided …