Urmăriți
Nathan Helm-Burger
Nathan Helm-Burger
SecureBio
Adresă de e-mail confirmată pe securebio.org
Titlu
Citat de
Citat de
Anul
The wmdp benchmark: Measuring and reducing malicious use with unlearning
N Li, A Pan, A Gopal, S Yue, D Berrios, A Gatti, JD Li, AK Dombrowski, ...
arXiv preprint arXiv:2403.03218, 2024
1152024
Will releasing the weights of future large language models grant widespread access to pandemic agents?
A Gopal, N Helm-Burger, L Justen, EH Soice, T Tzeng, G Jeyapragasan, ...
arXiv preprint arXiv:2310.18233, 2023
82023
The wmdp benchmark: Measuring and reducing malicious use with unlearning, 2024
N Li, A Pan, A Gopal, S Yue, D Berrios, A Gatti, JD Li, AK Dombrowski, ...
URL https://arxiv. org/abs/2403.03218, 0
8
Will releasing the weights of large language models grant widespread access to pandemic agents
A Gopal, N Helm-Burger, L Justen, EH Soice, T Tzeng, G Jeyapragasan, ...
arXiv preprint arXiv 2310, 2023
72023
Will releasing the weights of future large language models grant widespread access to pandemic agents?, November 2023
A Gopal, N Helm-Burger, L Justen, EH Soice, T Tzeng, G Jeyapragasan, ...
URL http://arxiv. org/abs/2310.18233, 0
4
Noise Injection Reveals Hidden Capabilities of Sandbagging Language Models
C Tice, PA Kreer, N Helm-Burger, PS Shahani, F Ryzhenkov, J Haimes, ...
arXiv preprint arXiv:2412.01784, 2024
2024
Hunting for proprioception in larval zebrafish: does age matter?
N Helm-Burger
2013
Sandbag Detection through Model Impairment
C Tice, PA Kreer, N Helm-Burger, PS Shahani, F Ryzhenkov, ...
Workshop on Socially Responsible Language Modelling Research, 0
Sistemul nu poate realiza operația în acest moment. Încercați din nou mai târziu.
Articole 1–8