Nathaniel Li

620

310

155

465

20232024202556 609 103

Public access

1 article

0 articles

available

not available

Based on funding mandates

Andy ZouPhD Student, Carnegie Mellon UniversityVerified email at andrew.cmu.edu
Dan HendrycksDirector of the Center for AI Safety (advisor for xAI and Scale)Verified email at berkeley.edu
Steven BasartPhD, University of ChicagoVerified email at ttic.edu
Alexander PanUC BerkeleyVerified email at berkeley.edu
Zifan WangScaleAIVerified email at scale.com
Mantas MazeikaUniversity of Illinois Urbana-ChampaignVerified email at illinois.edu
Hugh ZhangScale AIVerified email at seas.harvard.edu
Cristina MenghiniScale AIVerified email at scale.com

Nathaniel Li

Scale AI, Center for AI Safety

Verified email at berkeley.edu - Homepage


Title Sort by citations Sort by year Sort by title	Cited by Cited by	Year
Representation Engineering: A Top-Down Approach to AI Transparency A Zou, L Phan, S Chen, J Campbell, P Guo, R Ren*, A Pan, X Yin, ... arXiv preprint arXiv:2310.01405, 2023	317*	2023
HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal M Mazeika, L Phan, X Yin, A Zou, Z Wang, N Mu, E Sakhaee, N Li, ... ICML 2024, 2024	190*	2024
Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the MACHIAVELLI Benchmark A Pan, JS Chan, A Zou*, N Li, S Basart, T Woodside, H Zhang, ... ICML 2023 (Oral), 26837-26867, 2023	136	2023
The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning N Li, A Pan, A Gopal†, S Yue†, D Berrios†, A Gatti‡, JD Li‡, ... ICML 2024, 2024	105	2024
LLM Defenses Are Not Robust to Multi-Turn Human Jailbreaks Yet N Li, Z Han, I Steneker, W Primack, R Goodside, H Zhang, Z Wang, ... NeurIPS 2024 Red Teaming Workshop (Oral), 2024	25	2024
Humanity's Last Exam L Phan, A Gatti, Z Han, N Li, J Hu, H Zhang, S Shi, M Choi, A Agrawal, ... arXiv preprint arXiv:2501.14249, 2025		2025
Robustness Evaluation of Proxy Models Against Adversarial Optimization A Zou, L Phan, N Li, JS Chan, M Mazeika, A O'Gara, S Basart, J Ng, ...

The system can't perform the operation now. Try again later.

Articles 1–7

Citations per year