Regret bounds for risk-sensitive reinforcement learning O Bastani, JY Ma, E Shen, W Xu Advances in Neural Information Processing Systems 35, 36259-36269, 2022 | 22 | 2022 |
Uniformly conservative exploration in reinforcement learning W Xu, Y Ma, K Xu, H Bastani, O Bastani International Conference on Artificial Intelligence and Statistics, 10856-10870, 2023 | 17* | 2023 |
Gaps of summands of the Zeckendorf lattice N Borade, D Cai, DZ Chang, B Fang, A Liang, SJ Miller, W Xu The Fibonacci Quarterly 58 (2), 143-156, 2020 | 9 | 2020 |
Shattering the agent-environment interface for fine-tuning inclusive language models W Xu, S Dong, D Arumugam, B Van Roy arXiv preprint arXiv:2305.11455, 2023 | 8 | 2023 |
Pearl: A Production-ready Reinforcement Learning Agent Z Zhu, RS Braz, J Bhandari, D Jiang, Y Wan, Y Efroni, L Wang, R Xu, ... Journal of Machine Learning Research (JMLR), 2024 | 6 | 2024 |
Distribution of eigenvalues of matrix ensembles arising from wigner and palindromic toeplitz blocks K Blackwell, N Borade, A Bose, CD VI, N Luntzlara, R Ma, SJ Miller, ... arXiv preprint arXiv:2102.05839, 2021 | 4 | 2021 |
Rlhf and iia: Perverse incentives W Xu, S Dong, X Lu, G Lam, Z Wen, B Van Roy arXiv preprint arXiv:2312.01057, 2023 | 3 | 2023 |
Distribution of eigenvalues of random real symmetric block matrices K Blackwell, N Borade, CD VI, N Luntzlara, R Ma, SJ Miller, M Wang, ... arXiv preprint arXiv:1908.03834, 2019 | 2 | 2019 |
Exploration Unbound D Arumugam, W Xu, B Van Roy RLC 2024 Finding the Frame Workshop, 2024 | | 2024 |
Posterior Sampling for Continuing Environments W Xu, S Dong, B Van Roy Reinforcement Learning Conference (RLC) 2024, 2022 | | 2022 |