Safety cases: How to justify the safety of advanced AI systems J Clymer, N Gabrieli, D Krueger, T Larsen arXiv preprint arXiv:2403.10462, 2024 | 26* | 2024 |
Affirmative safety: An approach to risk management for advanced AI A Wasil, J Clymer, D Krueger, E Dardaman, S Campos, E Murphy Available at SSRN 4806274, 2024 | 9* | 2024 |
Generalization Analogies: A Testbed for Generalizing AI Oversight to Hard-To-Measure Domains J Clymer, G Baker, R Subramani, S Wang arXiv preprint arXiv:2311.07723, 2023 | 8 | 2023 |
Towards evaluations-based safety cases for ai scheming M Balesni, M Hobbhahn, D Lindner, A Meinke, T Korbak, J Clymer, ... arXiv preprint arXiv:2411.03336, 2024 | 7 | 2024 |
Re-bench: Evaluating frontier ai r&d capabilities of language model agents against human experts H Wijk, T Lin, J Becker, S Jawhar, N Parikh, T Broadley, L Chan, M Chen, ... arXiv preprint arXiv:2411.15114, 2024 | 2 | 2024 |
Poser: Unmasking Alignment Faking LLMs by Manipulating Their Internals J Clymer, C Juang, S Field arXiv preprint arXiv:2405.05466, 2024 | 2 | 2024 |
A sketch of an AI control safety case T Korbak, J Clymer, B Hilton, B Shlegeris, G Irving arXiv preprint arXiv:2501.17315, 2025 | | 2025 |