Meta-reward-net: Implicitly differentiable reward learning for preference-based reinforcement learning R Liu, F Bai, Y Du, Y Yang Advances in Neural Information Processing Systems 35, 22270-22284, 2022 | 57 | 2022 |
Picor: Multi-task deep reinforcement learning with policy correction F Bai, H Zhang, T Tao, Z Wu, Y Wang, B Xu Proceedings of the AAAI Conference on Artificial Intelligence 37 (6), 6728-6736, 2023 | 16 | 2023 |
Measuring value understanding in language models through discriminator-critique gap Z Zhang, F Bai, J Gao, Y Yang arXiv preprint arXiv:2310.00378, 2023 | 8 | 2023 |
PEARL: zero-shot cross-task preference alignment and robust reward learning for robotic manipulation R Liu, Y Du, F Bai, J Lyu, X Li International Conference on Machine Learning, 2024 | 6* | 2024 |
Efficient Preference-based Reinforcement Learning via Aligned Experience Estimation F Bai, R Zhao, H Zhang, S Cui, Y Wen, Y Yang, B Xu, L Han arXiv preprint arXiv:2405.18688, 2024 | 5 | 2024 |
RAT: Adversarial Attacks on Deep Reinforcement Agents for Targeted Behaviors F Bai, R Liu, Y Du, Y Wen, Y Yang arXiv preprint arXiv:2412.10713, 2024 | 2 | 2024 |
Efficient Model-agnostic Alignment via Bayesian Persuasion F Bai, M Wang, Z Zhang, B Chen, Y Xu, Y Wen, Y Yang arXiv preprint arXiv:2405.18718, 2024 | 2 | 2024 |
Incentive Compatibility for AI Alignment in Sociotechnical Systems: Positions and Prospects Z Zhang, F Bai, M Wang, H Ye, C Ma, Y Yang arXiv preprint arXiv:2402.12907, 2024 | 2 | 2024 |
GRAIT: Gradient-Driven Refusal-Aware Instruction Tuning for Effective Hallucination Mitigation R Zhu, Z Jiang, J Wu, Z Ma, J Song, F Bai, D Lin, L Wu, C He arXiv preprint arXiv:2502.05911, 2025 | | 2025 |
-DQN: Improving Deep Q-Learning By Evolving the Behavior H Zhang, F Bai, C Xiao, C Gao, B Xu, M Müller arXiv preprint arXiv:2501.00913, 2025 | | 2025 |