The wmdp benchmark: Measuring and reducing malicious use with unlearning N Li, A Pan, A Gopal, S Yue, D Berrios, A Gatti, JD Li, AK Dombrowski, ... arXiv preprint arXiv:2403.03218, 2024 | 105 | 2024 |
Designing a Dashboard for Transparency and Control of Conversational AI Y Chen, A Wu, T DePodesta, C Yeh, K Li, NC Marin, O Patel, J Riecke, ... arXiv preprint arXiv:2406.07882, 2024 | 14 | 2024 |
Humanity's Last Exam L Phan, A Gatti, Z Han, N Li, J Hu, H Zhang, S Shi, M Choi, A Agrawal, ... arXiv preprint arXiv:2501.14249, 2025 | | 2025 |