Følg
Heyang Zhao
Heyang Zhao
Verificeret mail på cs.ucla.edu - Startside
Titel
Citeret af
Citeret af
År
Nearly minimax optimal reinforcement learning for linear markov decision processes
J He, H Zhao, D Zhou, Q Gu
International Conference on Machine Learning, 12790-12822, 2023
572023
Variance-dependent regret bounds for linear bandits and reinforcement learning: Adaptivity and computational efficiency
H Zhao, J He, D Zhou, T Zhang, Q Gu
The Thirty Sixth Annual Conference on Learning Theory, 2023
322023
Linear contextual bandits with adversarial corruptions
H Zhao, D Zhou, Q Gu
arXiv preprint arXiv:2110.12615, 2021
232021
A nearly optimal and low-switching algorithm for reinforcement learning with general function approximation
H Zhao, J He, Q Gu
arXiv preprint arXiv:2311.15238, 2023
122023
Variance-aware regret bounds for stochastic contextual dueling bandits
Q Di, T Jin, Y Wu, H Zhao, F Farnoud, Q Gu
arXiv preprint arXiv:2310.00968, 2023
122023
Optimal online generalized linear regression with stochastic noise and its application to heteroscedastic bandits
H Zhao, D Zhou, J He, Q Gu
International Conference on Machine Learning, 42259-42279, 2023
11*2023
Pessimistic nonlinear least-squares value iteration for offline reinforcement learning
Q Di, H Zhao, J He, Q Gu
arXiv preprint arXiv:2310.01380, 2023
92023
Feel-good thompson sampling for contextual dueling bandits
X Li, H Zhao, Q Gu
arXiv preprint arXiv:2404.06013, 2024
82024
Sharp analysis for kl-regularized contextual bandits and rlhf
H Zhao, C Ye, Q Gu, T Zhang
arXiv preprint arXiv:2411.04625, 2024
32024
Logarithmic Regret for Online KL-Regularized Reinforcement Learning
H Zhao, C Ye, W Xiong, Q Gu, T Zhang
arXiv preprint arXiv:2502.07460, 2025
2025
Nearly Optimal Sample Complexity of Offline KL-Regularized Contextual Bandits under Single-Policy Concentrability
Q Zhao, K Ji, H Zhao, T Zhang, Q Gu
arXiv preprint arXiv:2502.06051, 2025
2025
Beyond-Expert Performance with Limited Demonstrations: Efficient Imitation Learning with Double Exploration
H Zhao, X Yu, DM Bossens, I Tsang, Q Gu
The Thirteenth International Conference on Learning Representations, 0
Systemet kan ikke foretage handlingen nu. Prøv igen senere.
Artikler 1–12