Aligner: Achieving efficient alignment through weak-to-strong correction J Ji, B Chen, H Lou, D Hong, B Zhang, X Pan, J Dai, Y Yang arXiv preprint arXiv:2402.02416, 2024 | 47 | 2024 |
Pku-saferlhf: A safety alignment preference dataset for llama family models J Ji, D Hong, B Zhang, B Chen, J Dai, B Zheng, T Qiu, B Li, Y Yang CoRR, 2024 | 14 | 2024 |
Pku-saferlhf: Towards multi-level safety alignment for llms with human preference J Ji, D Hong, B Zhang, B Chen, J Dai, B Zheng, T Qiu, B Li, Y Yang arXiv preprint arXiv:2406.15513, 2024 | 9 | 2024 |
Aligner: Efficient alignment by learning to correct J Ji, B Chen, H Lou, D Hong, B Zhang, X Pan, J Dai, T Qiu, Y Yang arXiv preprint arXiv:2402.02416, 2024 | 3 | 2024 |
Align anything: Training all-modality models to follow instructions with language feedback J Ji, J Zhou, H Lou, B Chen, D Hong, X Wang, W Chen, K Wang, R Pan, ... arXiv preprint arXiv:2412.15838, 2024 | 2 | 2024 |
Libra-leaderboard: Towards responsible ai through a balanced leaderboard of safety and capability H Li, X Han, Z Zhai, H Mu, H Wang, Z Zhang, Y Geng, S Lin, R Wang, ... arXiv preprint arXiv:2412.18551, 2024 | 1 | 2024 |