A survey on model-based reinforcement learning FM Luo, T Xu, H Lai, XH Chen, W Zhang, Y Yu Science China Information Sciences 67 (2), 121101, 2024 | 142 | 2024 |
Error bounds of imitating policies and environments T Xu, Z Li, Y Yu Advances in Neural Information Processing Systems 33, 15737-15749, 2020 | 109 | 2020 |
Remax: A simple, effective, and efficient reinforcement learning method for aligning large language models Z Li, T Xu, Y Zhang, Z Lin, Y Yu, R Sun, ZQ Luo arXiv preprint arXiv:2310.10505, 2023 | 54 | 2023 |
Error bounds of imitating policies and environments for reinforcement learning T Xu, Z Li, Y Yu IEEE Transactions on Pattern Analysis and Machine Intelligence 44 (10), 6968 …, 2021 | 43 | 2021 |
Policy optimization in rlhf: The impact of out-of-preference data Z Li, T Xu, Y Yu arXiv preprint arXiv:2312.10584, 2023 | 21 | 2023 |
Rethinking ValueDice: Does it really improve performance? Z Li, T Xu, Y Yu, ZQ Luo arXiv preprint arXiv:2202.02468, 2022 | 18 | 2022 |
Provably efficient adversarial imitation learning with unknown transitions T Xu, Z Li, Y Yu, ZQ Luo Uncertainty in Artificial Intelligence, 2367-2378, 2023 | 16* | 2023 |
Imitation learning from imperfection: Theoretical justifications and algorithms Z Li, T Xu, Z Qin, Y Yu, ZQ Luo Advances in Neural Information Processing Systems 36, 18404-18443, 2023 | 14* | 2023 |
Reward-consistent dynamics models are strongly generalizable for offline reinforcement learning FM Luo, T Xu, X Cao, Y Yu arXiv preprint arXiv:2310.05422, 2023 | 12 | 2023 |
Understanding adversarial imitation learning in small sample regime: A stage-coupled analysis T Xu, Z Li, Y Yu, ZQ Luo arXiv preprint arXiv:2208.01899, 2022 | 9 | 2022 |
Yang Yu JC Pang, T Xu, S Jiang, YR Liu Sparsity prior regularized q-learning for sparse action tasks. CoRR, abs …, 2021 | 7* | 2021 |
Model gradient: unified model and policy learning in model-based reinforcement learning C Jia, F Zhang, T Xu, JC Pang, Z Zhang, Y Yu Frontiers of Computer Science 18 (4), 184339, 2024 | 5 | 2024 |
Policy rehearsing: Training generalizable policies for reinforcement learning C Jia, C Gao, H Yin, F Zhang, XH Chen, T Xu, L Yuan, Z Zhang, ZH Zhou, ... The Twelfth International Conference on Learning Representations, 2024 | 4 | 2024 |
Entropic distribution matching in supervised fine-tuning of LLMs: Less overfitting and better diversity Z Li, C Chen, T Xu, Z Qin, J Xiao, R Sun, ZQ Luo arXiv preprint arXiv:2408.16673, 2024 | 2 | 2024 |
Offline Imitation Learning without Auxiliary High-quality Behavior Data JJ Shao, HS Shi, T Xu, LZ Guo, Y Yu, YF Li | 2 | 2024 |
Validation on safety of the intended functionality of automated vehicles: Concept development J Hu, T Xu, X Yan, R Zhang SAE International Journal of Connected and Automated Vehicles 6 (12-06-01 …, 2022 | 2 | 2022 |
Limited preference aided imitation learning from imperfect demonstrations X Cao, FM Luo, J Ye, T Xu, Z Zhang, Y Yu Forty-first International Conference on Machine Learning, 2024 | 1 | 2024 |
When is RL better than DPO in RLHF? A Representation and Optimization Perspective Z Li, T Xu, Y Yu The Second Tiny Papers Track at ICLR 2024, 2024 | 1 | 2024 |
A Note on Target Q-learning For Solving Finite MDPs with A Generative Oracle Z Li, T Xu, Y Yu arXiv preprint arXiv:2203.11489, 2022 | 1 | 2022 |
Provably and Practically Efficient Adversarial Imitation Learning with General Function Approximation T Xu, Z Zhang, R Chen, Y Sun, Y Yu Advances in Neural Information Processing Systems 37, 66108-66146, 2025 | | 2025 |