ติดตาม
Xingzhou Lou
Xingzhou Lou
Institution of Automation, Chinese Academy of Sciences
ยืนยันอีเมลแล้วที่ ia.ac.cn
ชื่อ
อ้างโดย
อ้างโดย
ปี
Pecan: Leveraging policy ensemble for context-aware zero-shot human-ai coordination
X Lou, J Guo, J Zhang, J Wang, K Huang, Y Du
arXiv preprint arXiv:2301.06387, 2023
242023
An efficient end-to-end training approach for zero-shot human-AI coordination
X Yan, J Guo, X Lou, J Wang, H Zhang, Y Du
Advances in Neural Information Processing Systems 36, 2636-2658, 2023
122023
Uncertainty-aware reward model: Teaching reward models to know what is unknown
X Lou, D Yan, W Shen, Y Yan, J Xie, J Zhang
arXiv preprint arXiv:2410.00847, 2024
112024
Offline reinforcement learning with representations for actions
X Lou, Q Yin, J Zhang, C Yu, Z He, N Cheng, K Huang
Information Sciences 610, 746-758, 2022
92022
Spo: Multi-dimensional preference sequential alignment with implicit reward modeling
X Lou, J Zhang, J Xie, L Liu, D Yan, K Huang
arXiv preprint arXiv:2405.12739, 2024
62024
Safe reinforcement learning with free-form natural language constraints and pre-trained language models
X Lou, J Zhang, Z Wang, K Huang, Y Du
arXiv preprint arXiv:2401.07553, 2024
42024
Position: Foundation agents as the paradigm shift for decision making
X Liu, X Lou, J Jiao, J Zhang
arXiv preprint arXiv:2405.17009, 2024
32024
Leveraging Joint-Action Embedding in Multiagent Reinforcement Learning for Cooperative Games
X Lou, J Zhang, Y Du, C Yu, Z He, K Huang
IEEE Transactions on Games 16 (2), 470-482, 2023
32023
Reward-robust rlhf in llms
Y Yan, X Lou, J Li, Y Zhang, J Xie, C Yu, Y Wang, D Yan, Y Shen
arXiv preprint arXiv:2409.15360, 2024
22024
TAPE: leveraging agent topology for cooperative multi-agent policy gradient
X Lou, J Zhang, TJ Norman, K Huang, Y Du
Proceedings of the AAAI Conference on Artificial Intelligence 38 (16), 17496 …, 2024
2024
SPO: Multi-Dimensional Preference Alignment With Implicit Reward Modeling
X Lou, J Zhang, J Xie, L Liu, D Yan, K Huang
ระบบไม่สามารถดำเนินการได้ในขณะนี้ โปรดลองใหม่อีกครั้งในภายหลัง
บทความ 1–11