Recent advances in reinforcement learning in finance
The rapid changes in the finance industry due to the increasing amount of data have
revolutionized the techniques on data processing and data analysis and brought new …
revolutionized the techniques on data processing and data analysis and brought new …
Causal reinforcement learning: A survey
Customer acquisition via display advertising using multi-armed bandit experiments
Firms using online advertising regularly run experiments with multiple versions of their ads
since they are uncertain about which ones are most effective. During a campaign, firms try to …
since they are uncertain about which ones are most effective. During a campaign, firms try to …
Optimistic posterior sampling for reinforcement learning: worst-case regret bounds
S Agrawal, R Jia - Advances in Neural Information …, 2017 - proceedings.neurips.cc
We present an algorithm based on posterior sampling (aka Thompson sampling) that
achieves near-optimal worst-case regret bounds when the underlying Markov Decision …
achieves near-optimal worst-case regret bounds when the underlying Markov Decision …
Self-exploring language models: Active preference elicitation for online alignment
Preference optimization, particularly through Reinforcement Learning from Human
Feedback (RLHF), has achieved significant success in aligning Large Language Models …
Feedback (RLHF), has achieved significant success in aligning Large Language Models …
Reinforcement learning for efficient network penetration testing
Penetration testing (also known as pentesting or PT) is a common practice for actively
assessing the defenses of a computer network by planning and executing all possible …
assessing the defenses of a computer network by planning and executing all possible …
Efficient model-based reinforcement learning through optimistic policy search and planning
Abstract Model-based reinforcement learning algorithms with probabilistic dynamical
models are amongst the most data-efficient learning methods. This is often attributed to their …
models are amongst the most data-efficient learning methods. This is often attributed to their …
Learning to optimize via information-directed sampling
We propose information-directed sampling—a new approach to online optimization
problems in which a decision maker must balance between exploration and exploitation …
problems in which a decision maker must balance between exploration and exploitation …
A Bayesian framework for digital twin-based control, monitoring, and data collection in wireless systems
Commonly adopted in the manufacturing and aerospace sectors, digital twin (DT) platforms
are increasingly seen as a promising paradigm to control, monitor, and analyze software …
are increasingly seen as a promising paradigm to control, monitor, and analyze software …