Heyang Zhao

Citeret af

	Alle	Siden 2020
Henvisninger	167	167
h-index	8	8
i10-indeks	6	6

20222023202420257 51 90 19

Offentlig adgang

Se alle

3 artikler

0 artikler

tilgængelige

ikke tilgængelige

Baseret på krav i forbindelse med finansiering

Medforfattere

Quanquan GuAssociate Professor of Computer Science, UCLAVerificeret mail på cs.ucla.edu
Jiafan HePhD student, Department of Computer Science, UCLAVerificeret mail på ucla.edu
Dongruo ZhouIndiana University BloomingtonVerificeret mail på iu.edu
Tong ZhangUIUCVerificeret mail på tongzhang-ml.org
QIWEI DIPhd student, Department of Computer Science , University of California, Los AngelesVerificeret mail på cs.ucla.edu
Farzad FarnoudUniversity of VirginiaVerificeret mail på virginia.edu
Tao JinPhD Student, University of VirginiaVerificeret mail på virginia.edu
Yue WuPostdoctoral Research Fellow, Princeton UniversityVerificeret mail på ucla.edu
XUHENG LIDepartment of Computer Science, University of California, Los AngelesVerificeret mail på ucla.edu
Chenlu YeComputer Science, University of Illinois Urbana-ChampaignVerificeret mail på illinois.edu

Følg

Heyang Zhao

UCLA

Verificeret mail på cs.ucla.edu - Startside

Machine Learning


Titel Sortér efter henvisninger Sortér efter årstal Sortér efter titel	Citeret af Citeret af	År
Nearly minimax optimal reinforcement learning for linear markov decision processes J He, H Zhao, D Zhou, Q Gu International Conference on Machine Learning, 12790-12822, 2023	57	2023
Variance-dependent regret bounds for linear bandits and reinforcement learning: Adaptivity and computational efficiency H Zhao, J He, D Zhou, T Zhang, Q Gu The Thirty Sixth Annual Conference on Learning Theory, 2023	32	2023
Linear contextual bandits with adversarial corruptions H Zhao, D Zhou, Q Gu arXiv preprint arXiv:2110.12615, 2021	23	2021
A nearly optimal and low-switching algorithm for reinforcement learning with general function approximation H Zhao, J He, Q Gu arXiv preprint arXiv:2311.15238, 2023	12	2023
Variance-aware regret bounds for stochastic contextual dueling bandits Q Di, T Jin, Y Wu, H Zhao, F Farnoud, Q Gu arXiv preprint arXiv:2310.00968, 2023	12	2023
Optimal online generalized linear regression with stochastic noise and its application to heteroscedastic bandits H Zhao, D Zhou, J He, Q Gu International Conference on Machine Learning, 42259-42279, 2023	11*	2023
Pessimistic nonlinear least-squares value iteration for offline reinforcement learning Q Di, H Zhao, J He, Q Gu arXiv preprint arXiv:2310.01380, 2023	9	2023
Feel-good thompson sampling for contextual dueling bandits X Li, H Zhao, Q Gu arXiv preprint arXiv:2404.06013, 2024	8	2024
Sharp analysis for kl-regularized contextual bandits and rlhf H Zhao, C Ye, Q Gu, T Zhang arXiv preprint arXiv:2411.04625, 2024	3	2024
Logarithmic Regret for Online KL-Regularized Reinforcement Learning H Zhao, C Ye, W Xiong, Q Gu, T Zhang arXiv preprint arXiv:2502.07460, 2025		2025
Nearly Optimal Sample Complexity of Offline KL-Regularized Contextual Bandits under Single-Policy Concentrability Q Zhao, K Ji, H Zhao, T Zhang, Q Gu arXiv preprint arXiv:2502.06051, 2025		2025
Beyond-Expert Performance with Limited Demonstrations: Efficient Imitation Learning with Double Exploration H Zhao, X Yu, DM Bossens, I Tsang, Q Gu The Thirteenth International Conference on Learning Representations, 0

Systemet kan ikke foretage handlingen nu. Prøv igen senere.

Artikler 1–12

Henvisninger pr. år

Dublerede henvisninger

Flettede henvisninger

Tilføj medforfattereMedforfattere

Følg

Citeret af

Medforfattere