Ai alignment: A comprehensive survey J Ji*, T Qiu*, B Chen*, B Zhang*, H Lou, K Wang, Y Duan, Z He, J Zhou, ... arXiv preprint arXiv:2310.19852, 2023 | 222 | 2023 |
Aligner: Efficient Alignment by Learning to Correct J Ji, B Chen, H Lou, D Hong, B Zhang, X Pan, J Dai, T Qiu, Y Yang NeurIPS 2024 (Oral), 2024 | 49* | 2024 |
PKU-SafeRLHF: Towards Multi-Level Safety Alignment for LLMs with Human Preference J Ji, D Hong, B Zhang, B Chen, J Dai, B Zheng, T Qiu, B Li, Y Yang arXiv preprint arXiv:2406.15513, 2024 | 21* | 2024 |
Language Models Resist Alignment J Ji*, K Wang*, T Qiu*, B Chen*, J Zhou, C Li, H Lou, Y Yang arXiv preprint arXiv:2406.06144, 2024 | 5 | 2024 |
Reward Generalization in RLHF: A Topological Perspective T Qiu✞, F Zeng*, J Ji*, D Yan*, K Wang, J Zhou, Y Han, J Dai, X Pan, ... arXiv preprint arXiv:2402.10184, 2024 | 4 | 2024 |
Align Anything: Training All-Modality Models to Follow Instructions with Language Feedback J Ji, J Zhou, H Lou, B Chen, D Hong, X Wang, W Chen, K Wang, R Pan, ... arXiv preprint arXiv:2412.15838, 2024 | 2 | 2024 |
ProgressGym: Alignment with a Millennium of Moral Progress T Qiu✞, Y Zhang*, X Huang, JX Li, J Ji, Y Yang NeurIPS 2024 (Spotlight, Track on Datasets and Benchmarks), 2024 | 1 | 2024 |
Representative Social Choice: From Learning Theory to AI Alignment T Qiu NeurIPS 2024 Pluralistic Alignment Workshop (Best Paper Award), 2024 | | 2024 |