Segui
Kavel Rao
Kavel Rao
Email verificata su cs.washington.edu
Titolo
Citata da
Citata da
Anno
Value kaleidoscope: Engaging ai with pluralistic human values, rights, and duties
T Sorensen, L Jiang, JD Hwang, S Levine, V Pyatkin, P West, N Dziri, ...
Proceedings of the AAAI Conference on Artificial Intelligence 38 (18), 19937 …, 2024
842024
WildGuard: Open One-Stop Moderation Tools for Safety Risks, Jailbreaks, and Refusals of LLMs
S Han*, K Rao*, A Ettinger, L Jiang, BY Lin, N Lambert, Y Choi, N Dziri
arXiv preprint arXiv:2406.18495, 2024
372024
WildTeaming at Scale: From In-the-Wild Jailbreaks to (Adversarially) Safer Language Models
L Jiang, K Rao*, S Han*, A Ettinger, F Brahman, S Kumar, ...
arXiv preprint arXiv:2406.18510, 2024
16*2024
What Makes it Ok to Set a Fire? Iterative Self-distillation of Contexts and Rationales for Disambiguating Defeasible Social and Moral Situations
K Rao*, L Jiang*, V Pyatkin, Y Gu, N Tandon, N Dziri, F Brahman, Y Choi
arXiv preprint arXiv:2310.15431, 2023
12*2023
ColorGrid: A Multi-Agent Non-Stationary Environment for Goal Inference and Assistance
A Risukhin, K Rao, B Caffee, A Fan
arXiv preprint arXiv:2501.10593, 2025
2025
To Err is AI: A Case Study Informing LLM Flaw Reporting Practices
S McGregor, A Ettinger, N Judd, P Albee, L Jiang, K Rao, W Smith, ...
arXiv preprint arXiv:2410.12104, 2024
2024
Il sistema al momento non può eseguire l'operazione. Riprova più tardi.
Articoli 1–6