Recent advances in reinforcement learning in finance

B Hambly, R Xu, H Yang - Mathematical Finance, 2023 - Wiley Online Library
The rapid changes in the finance industry due to the increasing amount of data have
revolutionized the techniques on data processing and data analysis and brought new …

Customer acquisition via display advertising using multi-armed bandit experiments

EM Schwartz, ET Bradlow, PS Fader - Marketing Science, 2017 - pubsonline.informs.org
Firms using online advertising regularly run experiments with multiple versions of their ads
since they are uncertain about which ones are most effective. During a campaign, firms try to …

Optimistic posterior sampling for reinforcement learning: worst-case regret bounds

S Agrawal, R Jia - Advances in Neural Information …, 2017 - proceedings.neurips.cc
We present an algorithm based on posterior sampling (aka Thompson sampling) that
achieves near-optimal worst-case regret bounds when the underlying Markov Decision …

Self-exploring language models: Active preference elicitation for online alignment

S Zhang, D Yu, H Sharma, H Zhong, Z Liu… - arxiv preprint arxiv …, 2024 - arxiv.org
Preference optimization, particularly through Reinforcement Learning from Human
Feedback (RLHF), has achieved significant success in aligning Large Language Models …

Reinforcement learning for efficient network penetration testing

MC Ghanem, TM Chen - Information, 2019 - mdpi.com
Penetration testing (also known as pentesting or PT) is a common practice for actively
assessing the defenses of a computer network by planning and executing all possible …

Efficient model-based reinforcement learning through optimistic policy search and planning

S Curi, F Berkenkamp, A Krause - Advances in Neural …, 2020 - proceedings.neurips.cc
Abstract Model-based reinforcement learning algorithms with probabilistic dynamical
models are amongst the most data-efficient learning methods. This is often attributed to their …

Learning to optimize via information-directed sampling

D Russo, B Van Roy - Operations Research, 2018 - pubsonline.informs.org
We propose information-directed sampling—a new approach to online optimization
problems in which a decision maker must balance between exploration and exploitation …

A Bayesian framework for digital twin-based control, monitoring, and data collection in wireless systems

C Ruah, O Simeone… - IEEE Journal on Selected …, 2023 - ieeexplore.ieee.org
Commonly adopted in the manufacturing and aerospace sectors, digital twin (DT) platforms
are increasingly seen as a promising paradigm to control, monitor, and analyze software …