Ikuti
Boyuan Chen
Judul
Dikutip oleh
Dikutip oleh
Tahun
Beavertails: Towards improved safety alignment of llm via a human-preference dataset
J Ji, M Liu, J Dai, X Pan, C Zhang, C Bian, B Chen, R Sun, Y Wang, ...
Advances in Neural Information Processing Systems 36, 2024
3082024
Ai alignment: A comprehensive survey
J Ji, T Qiu, B Chen, B Zhang, H Lou, K Wang, Y Duan, Z He, J Zhou, ...
arXiv preprint arXiv:2310.19852, 2023
2262023
Aligner: Efficient alignment by learning to correct
J Ji, B Chen, H Lou, D Hong, B Zhang, X Pan, T Qiu, J Dai, Y Yang
NeurIPS 2024 (Oral), 2024
50*2024
Pku-saferlhf: A safety alignment preference dataset for llama family models
J Ji, D Hong, B Zhang, B Chen, J Dai, B Zheng, T Qiu, B Li, Y Yang
CoRR, 2024
142024
Language Models Resist Alignment
J Ji, K Wang, T Qiu, B Chen, J Zhou, C Li, H Lou, Y Yang
NeurIPS 2024 SoLaR Workshop, 2024
52024
Align anything: Training all-modality models to follow instructions with language feedback
J Ji, J Zhou, H Lou, B Chen, D Hong, X Wang, W Chen, K Wang, R Pan, ...
arXiv preprint arXiv:2412.15838, 2024
22024
Efficient Model-agnostic Alignment via Bayesian Persuasion
F Bai, M Wang, Z Zhang, B Chen, Y Xu, Y Wen, Y Yang
arXiv preprint arXiv:2405.18718, 2024
12024
Sistem tidak dapat melakukan operasi ini. Coba lagi nanti.
Artikel 1–7