Подписаться
Luxi He
Luxi He
Department of Computer Science, Princeton University
Подтвержден адрес электронной почты в домене princeton.edu - Главная страница
Название
Процитировано
Процитировано
Год
Sorry-bench: Systematically evaluating large language model safety refusal behaviors
T Xie, X Qi, Y Zeng, Y Huang, UM Sehwag, K Huang, L He, B Wei, D Li, ...
ICLR 2025, 2024
282024
What is in Your Safe Data? Identifying Benign Data that Breaks Safety
L He, M Xia, P Henderson
COLM 2024, 2024
242024
Charxiv: Charting gaps in realistic chart understanding in multimodal llms
Z Wang, M Xia, L He, H Chen, Y Liu, R Zhu, K Liang, X Wu, H Liu, ...
NeurIPS 2024 Dataset & Benchmark, 2024
212024
Aleatoric and epistemic discrimination: Fundamental limits of fairness interventions
H Wang, L He, R Gao, F Calmon
Advances in Neural Information Processing Systems 36, 2024
162024
AI Risk Management Should Incorporate Both Safety and Security
X Qi, Y Huang, Y Zeng, E Debenedetti, J Geiping, L He, K Huang, ...
arXiv preprint arXiv:2405.19524, 2024
122024
Fantastic Copyrighted Beasts and How (Not) to Generate Them
L He, Y Huang, W Shi, T Xie, H Liu, Y Wang, L Zettlemoyer, C Zhang, ...
ICLR 2025, 2024
92024
Sorry-bench: Systematically evaluating large language model safety refusal behaviors, 2024
T Xie, X Qi, Y Zeng, Y Huang, UM Sehwag, K Huang, L He, B Wei, D Li, ...
URL https://arxiv. org/abs/2406.14598, 0
5
On evaluating the durability of safeguards for open-weight llms
X Qi, B Wei, N Carlini, Y Huang, T Xie, L He, M Jagielski, M Nasr, P Mittal, ...
ICLR 2025, 2024
42024
Metadata Conditioning Accelerates Language Model Pre-training
T Gao, A Wettig, L He, Y Dong, S Malladi, D Chen
arXiv preprint arXiv:2501.01956, 2025
12025
Cascaded to End-to-End: New Safety, Security, and Evaluation Questions for Audio Language Models
L He, X Qi, I Cheong, PMD Chen, P Henderson
В данный момент система не может выполнить эту операцию. Повторите попытку позднее.
Статьи 1–10