The wmdp benchmark: Measuring and reducing malicious use with unlearning N Li, A Pan, A Gopal, S Yue, D Berrios, A Gatti, JD Li, AK Dombrowski, ... arXiv preprint arXiv:2403.03218, 2024 | 115 | 2024 |
Will releasing the weights of future large language models grant widespread access to pandemic agents? A Gopal, N Helm-Burger, L Justen, EH Soice, T Tzeng, G Jeyapragasan, ... arXiv preprint arXiv:2310.18233, 2023 | 8 | 2023 |
The wmdp benchmark: Measuring and reducing malicious use with unlearning, 2024 N Li, A Pan, A Gopal, S Yue, D Berrios, A Gatti, JD Li, AK Dombrowski, ... URL https://arxiv. org/abs/2403.03218, 0 | 8 | |
Will releasing the weights of large language models grant widespread access to pandemic agents A Gopal, N Helm-Burger, L Justen, EH Soice, T Tzeng, G Jeyapragasan, ... arXiv preprint arXiv 2310, 2023 | 7 | 2023 |
Will releasing the weights of future large language models grant widespread access to pandemic agents?, November 2023 A Gopal, N Helm-Burger, L Justen, EH Soice, T Tzeng, G Jeyapragasan, ... URL http://arxiv. org/abs/2310.18233, 0 | 4 | |
Noise Injection Reveals Hidden Capabilities of Sandbagging Language Models C Tice, PA Kreer, N Helm-Burger, PS Shahani, F Ryzhenkov, J Haimes, ... arXiv preprint arXiv:2412.01784, 2024 | | 2024 |
Hunting for proprioception in larval zebrafish: does age matter? N Helm-Burger | | 2013 |
Sandbag Detection through Model Impairment C Tice, PA Kreer, N Helm-Burger, PS Shahani, F Ryzhenkov, ... Workshop on Socially Responsible Language Modelling Research, 0 | | |