フォロー
Kavel Rao
Kavel Rao
確認したメール アドレス: cs.washington.edu
タイトル
引用先
引用先
Value kaleidoscope: Engaging ai with pluralistic human values, rights, and duties
T Sorensen, L Jiang, JD Hwang, S Levine, V Pyatkin, P West, N Dziri, ...
Proceedings of the AAAI Conference on Artificial Intelligence 38 (18), 19937 …, 2024
79*2024
WildGuard: Open One-Stop Moderation Tools for Safety Risks, Jailbreaks, and Refusals of LLMs
S Han*, K Rao*, A Ettinger, L Jiang, BY Lin, N Lambert, Y Choi, N Dziri
arXiv preprint arXiv:2406.18495, 2024
282024
WildTeaming at Scale: From In-the-Wild Jailbreaks to (Adversarially) Safer Language Models
L Jiang, K Rao*, S Han*, A Ettinger, F Brahman, S Kumar, ...
arXiv preprint arXiv:2406.18510, 2024
14*2024
What Makes it Ok to Set a Fire? Iterative Self-distillation of Contexts and Rationales for Disambiguating Defeasible Social and Moral Situations
K Rao*, L Jiang*, V Pyatkin, Y Gu, N Tandon, N Dziri, F Brahman, Y Choi
arXiv preprint arXiv:2310.15431, 2023
13*2023
ColorGrid: A Multi-Agent Non-Stationary Environment for Goal Inference and Assistance
A Risukhin, K Rao, B Caffee, A Fan
arXiv preprint arXiv:2501.10593, 2025
2025
To Err is AI: A Case Study Informing LLM Flaw Reporting Practices
S McGregor, A Ettinger, N Judd, P Albee, L Jiang, K Rao, W Smith, ...
arXiv preprint arXiv:2410.12104, 2024
2024
現在システムで処理を実行できません。しばらくしてからもう一度お試しください。
論文 1–6