Stebėti
Boyuan Chen
Pavadinimas
Cituota
Cituota
Metai
Beavertails: Towards improved safety alignment of llm via a human-preference dataset
J Ji, M Liu, J Dai, X Pan, C Zhang, C Bian, B Chen, R Sun, Y Wang, ...
Advances in Neural Information Processing Systems 36, 24678-24704, 2023
3392023
Ai alignment: A comprehensive survey
J Ji, T Qiu, B Chen, B Zhang, H Lou, K Wang, Y Duan, Z He, J Zhou, ...
arXiv preprint arXiv:2310.19852, 2023
2472023
Aligner: Efficient alignment by learning to correct
J Ji, B Chen, H Lou, D Hong, B Zhang, X Pan, T Qiu, J Dai, Y Yang
NeurIPS 2024 (Oral), 2024
58*2024
Pku-saferlhf: A safety alignment preference dataset for llama family models
J Ji, D Hong, B Zhang, B Chen, J Dai, B Zheng, T Qiu, B Li, Y Yang
arXiv e-prints, arXiv: 2406.15513, 2024
172024
Language Models Resist Alignment
J Ji, K Wang, T Qiu, B Chen, J Zhou, C Li, H Lou, Y Yang
NeurIPS 2024 SoLaR Workshop, 2024
62024
Align anything: Training all-modality models to follow instructions with language feedback
J Ji, J Zhou, H Lou, B Chen, D Hong, X Wang, W Chen, K Wang, R Pan, ...
arXiv preprint arXiv:2412.15838, 2024
32024
Efficient model-agnostic alignment via bayesian persuasion
F Bai, M Wang, Z Zhang, B Chen, Y Xu, Y Wen, Y Yang
arXiv preprint arXiv:2405.18718, 2024
32024
Sistema negali atlikti operacijos. Bandykite vėliau dar kartą.
Straipsniai 1–7