- Academic Search

B Hambly, R Xu, H Yang - Mathematical Finance, 2023 - Wiley Online Library

The rapid changes in the finance industry due to the increasing amount of data have
revolutionized the techniques on data processing and data analysis and brought new …

Save Cite Cited by 206 Related articles All 13 versions Free GPT-4

[Free GPT-4]

[PDF] arxiv.org

Causal reinforcement learning: A survey

Z Deng, J Jiang, G Long, C Zhang - ar** just-in-time adaptive interventions (JITAIs), typically delivered via …

Save Cite Cited by 205 Related articles All 10 versions Free GPT-4

[Free GPT-4]

[PDF] ssrn.com

Customer acquisition via display advertising using multi-armed bandit experiments

EM Schwartz, ET Bradlow, PS Fader - Marketing Science, 2017 - pubsonline.informs.org

Firms using online advertising regularly run experiments with multiple versions of their ads
since they are uncertain about which ones are most effective. During a campaign, firms try to …

Save Cite Cited by 389 Related articles All 13 versions Free GPT-4 Library Search

[Free GPT-4]

[PDF] neurips.cc

Optimistic posterior sampling for reinforcement learning: worst-case regret bounds

S Agrawal, R Jia - Advances in Neural Information …, 2017 - proceedings.neurips.cc

We present an algorithm based on posterior sampling (aka Thompson sampling) that
achieves near-optimal worst-case regret bounds when the underlying Markov Decision …

Save Cite Cited by 254 Related articles All 13 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Self-exploring language models: Active preference elicitation for online alignment

S Zhang, D Yu, H Sharma, H Zhong, Z Liu… - arxiv preprint arxiv …, 2024 - arxiv.org

Preference optimization, particularly through Reinforcement Learning from Human
Feedback (RLHF), has achieved significant success in aligning Large Language Models …

Save Cite Cited by 16 Related articles All 3 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] mdpi.com

Reinforcement learning for efficient network penetration testing

MC Ghanem, TM Chen - Information, 2019 - mdpi.com

Penetration testing (also known as pentesting or PT) is a common practice for actively
assessing the defenses of a computer network by planning and executing all possible …

Save Cite Cited by 157 Related articles All 10 versions Free GPT-4 Cached

[Free GPT-4]

[PDF] neurips.cc

Efficient model-based reinforcement learning through optimistic policy search and planning

S Curi, F Berkenkamp, A Krause - Advances in Neural …, 2020 - proceedings.neurips.cc

Abstract Model-based reinforcement learning algorithms with probabilistic dynamical
models are amongst the most data-efficient learning methods. This is often attributed to their …

Save Cite Cited by 108 Related articles All 7 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] informs.org

Learning to optimize via information-directed sampling

D Russo, B Van Roy - Operations Research, 2018 - pubsonline.informs.org

We propose information-directed sampling—a new approach to online optimization
problems in which a decision maker must balance between exploration and exploitation …

Save Cite Cited by 146 Related articles All 6 versions Free GPT-4

[Free GPT-4]

[PDF] arxiv.org

A Bayesian framework for digital twin-based control, monitoring, and data collection in wireless systems

C Ruah, O Simeone… - IEEE Journal on Selected …, 2023 - ieeexplore.ieee.org

Commonly adopted in the manufacturing and aerospace sectors, digital twin (DT) platforms
are increasingly seen as a promising paradigm to control, monitor, and analyze software …

Save Cite Cited by 28 Related articles All 6 versions Free GPT-4

Create alert

Cite

Advanced search

Saved to My library

(More) efficient reinforcement learning via posterior sampling

Recent advances in reinforcement learning in finance

Causal reinforcement learning: A survey

Customer acquisition via display advertising using multi-armed bandit experiments

Optimistic posterior sampling for reinforcement learning: worst-case regret bounds

Self-exploring language models: Active preference elicitation for online alignment

Reinforcement learning for efficient network penetration testing

Efficient model-based reinforcement learning through optimistic policy search and planning

Learning to optimize via information-directed sampling

A Bayesian framework for digital twin-based control, monitoring, and data collection in wireless systems