フォロー
Tian Xu
Tian Xu
確認したメール アドレス: lamda.nju.edu.cn - ホームページ
タイトル
引用先
引用先
A survey on model-based reinforcement learning
FM Luo, T Xu, H Lai, XH Chen, W Zhang, Y Yu
Science China Information Sciences 67 (2), 121101, 2024
141*2024
Error bounds of imitating policies and environments
T Xu, Z Li, Y Yu
Advances in Neural Information Processing Systems 33, 15737-15749, 2020
1162020
Remax: A simple, effective, and efficient reinforcement learning method for aligning large language models
Z Li, T Xu, Y Zhang, Z Lin, Y Yu, R Sun, ZQ Luo
Forty-first International Conference on Machine Learning, 2023
44*2023
Error bounds of imitating policies and environments for reinforcement learning
T Xu, Z Li, Y Yu
IEEE Transactions on Pattern Analysis and Machine Intelligence 44 (10), 6968 …, 2021
402021
Rethinking ValueDice: Does it really improve performance?
Z Li, T Xu, Y Yu, ZQ Luo
arXiv preprint arXiv:2202.02468, 2022
142022
Yang Yu. Reward-consistent dynamics models are strongly generalizable for offline reinforcement learning
FM Luo, T Xu, X Cao
arXiv preprint arXiv:2310.05422, 2023
122023
Yang Yu. Policy optimization in rlhf: The impact of out-of-preference data
Z Li, T Xu
arXiv preprint arXiv:2312.10584, 2023
112023
Imitation learning from imperfection: Theoretical justifications and algorithms
Z Li, T Xu, Z Qin, Y Yu, ZQ Luo
Advances in Neural Information Processing Systems 36, 18404-18443, 2023
102023
Provably efficient adversarial imitation learning with unknown transitions
T Xu, Z Li, Y Yu, ZQ Luo
Uncertainty in Artificial Intelligence, 2367-2378, 2023
92023
Policy optimization in rlhf: The impact of out-of-preference data
Z Li, T Xu, Y Yu
arXiv preprint arXiv:2312.10584, 2023
82023
Understanding adversarial imitation learning in small sample regime: A stage-coupled analysis
T Xu, Z Li, Y Yu, ZQ Luo
arXiv preprint arXiv:2208.01899, 2022
62022
Testing and evaluation of autonomous vehicles based on safety of the intended functionality
J Hu, T Xu, R Zhang
2021 6th International Conference on Transportation Information and Safety …, 2021
62021
On generalization of adversarial imitation learning and beyond
T Xu, Z Li, Y Yu, ZQ Luo
arXiv preprint arXiv:2106.10424, 2021
52021
Model gradient: unified model and policy learning in model-based reinforcement learning
C Jia, F Zhang, T Xu, JC Pang, Z Zhang, Y Yu
Frontiers of Computer Science 18 (4), 184339, 2024
42024
Policy Rehearsing: Training Generalizable Policies for Reinforcement Learning
C Jia, C Gao, H Yin, F Zhang, XH Chen, T Xu, L Yuan, Z Zhang, ZH Zhou, ...
The Twelfth International Conference on Learning Representations, 2024
32024
Entropic distribution matching in supervised fine-tuning of LLMs: Less overfitting and better diversity
Z Li, C Chen, T Xu, Z Qin, J Xiao, R Sun, ZQ Luo
arXiv preprint arXiv:2408.16673, 2024
22024
Theoretical analysis of offline imitation with supplementary dataset
Z Li, T Xu, Y Yu, ZQ Luo
arXiv preprint arXiv:2301.11687, 2023
22023
Validation on safety of the intended functionality of automated vehicles: Concept development
J Hu, T Xu, X Yan, R Zhang
SAE International Journal of Connected and Automated Vehicles 6 (12-06-01 …, 2022
22022
Nearly Minimax Optimal Adversarial Imitation Learning with Known and Unknown Transitions
T Xu, Z Li, Y Yu
CoRR abs/2106.10424, 2021
22021
Offline Imitation Learning without Auxiliary High-quality Behavior Data
JJ Shao, HS Shi, T Xu, LZ Guo, Y Yu, YF Li
2
現在システムで処理を実行できません。しばらくしてからもう一度お試しください。
論文 1–20