Prati
Chenlu Ye
Chenlu Ye
Potvrđena adresa e-pošte na illinois.edu - Početna stranica
Naslov
Citirano
Citirano
Godina
Iterative preference learning from human feedback: Bridging theory and practice for rlhf under kl-constraint
W Xiong, H Dong, C Ye, Z Wang, H Zhong, H Ji, N Jiang, T Zhang
arXiv preprint arXiv:2312.11456, 2023
151*2023
Online iterative reinforcement learning from human feedback with general preference model
C Ye, W Xiong, Y Zhang, H Dong, N Jiang, T Zhang
Advances in Neural Information Processing Systems 37, 81773-81807, 2025
36*2025
Corruption-robust algorithms with uncertainty weighting for nonlinear contextual bandits and markov decision processes
C Ye, W Xiong, Q Gu, T Zhang
International Conference on Machine Learning, 39834-39863, 2023
282023
Corruption-Robust Offline Reinforcement Learning with General Function Approximation
C Ye, R Yang, Q Gu, T Zhang
Neural Information Processing Systems, 2023
182023
Towards robust model-based reinforcement learning against adversarial corruption
C Ye, J He, Q Gu, T Zhang
arXiv preprint arXiv:2402.08991, 2024
52024
Sharp analysis for kl-regularized contextual bandits and rlhf
H Zhao, C Ye, Q Gu, T Zhang
arXiv preprint arXiv:2411.04625, 2024
42024
Optimal sample selection through uncertainty estimation and its application in deep learning
Y Lin, C Liu, C Ye, Q Lian, Y Yao, T Zhang
arXiv preprint arXiv:2309.02476, 2023
32023
Provably Efficient High-Dimensional Bandit Learning with Batched Feedbacks
J Fan, Z Wang, Z Yang, C Ye
arXiv preprint arXiv:2311.13180, 2023
12023
Logarithmic Regret for Online KL-Regularized Reinforcement Learning
H Zhao, C Ye, W Xiong, Q Gu, T Zhang
arXiv preprint arXiv:2502.07460, 2025
2025
Catoni Contextual Bandits are Robust to Heavy-tailed Rewards
C Ye, Y Jin, A Agarwal, T Zhang
arXiv preprint arXiv:2502.02486, 2025
2025
Sustav trenutno ne može provesti ovu radnju. Pokušajte ponovo kasnije.
Članci 1–10