‪Henry Sleight‬ - ‪Academic Search‬

Get my own profile

Cited by

	All	Since 2020
Citations	107	107
h-index	5	5
i10-index	2	2

0

70

35

2023202420251 66 38

Co-authors

John HughesAnthropic Contractor, MATS, SpeechmaticsVerified email at speechmatics.com
Ethan PerezAnthropic; New York UniversityVerified email at anthropic.com
Rylan SchaefferStanford UniversityVerified email at stanford.edu

Henry Sleight

Henry Sleight

Research Manager, Anthropic Fellows Program, Program Manager, Constellation

Verified email at constellation.org - Homepage

AI Safety Adversarial Robustness Model Organisms of Misalignment


Title Sort by citations Sort by year Sort by title	Cited by Cited by	Year
Is model collapse inevitable? breaking the curse of recursion by accumulating real and synthetic data M Gerstgrasser, R Schaeffer, A Dey, R Rafailov, H Sleight, J Hughes, ... arXiv preprint arXiv:2404.01413, 2024	44	2024
Targeted latent adversarial training improves robustness to persistent harmful behaviors in llms A Sheshadri, A Ewart, P Guo, A Lynch, C Wu, V Hebbar, H Sleight, ... arXiv e-prints, arXiv: 2407.15549, 2024	18	2024
Latent adversarial training improves robustness to persistent harmful behaviors in llms A Sheshadri, A Ewart, P Guo, A Lynch, C Wu, V Hebbar, H Sleight, ... arXiv preprint arXiv:2407.15549, 2024	9	2024
When Do Universal Image Jailbreaks Transfer Between Vision-Language Models? R Schaeffer, D Valentine, L Bailey, J Chua, C Eyzaguirre, Z Durante, ... arXiv preprint arXiv:2407.15211, 2024	8	2024
Is model collapse inevitable M Gerstgrasser, R Schaeffer, A Dey, R Rafailov, H Sleight, J Hughes, ... Breaking the curse of recursion by accumulating real and synthetic data …, 2024	5	2024
When do universal image jailbreaks transfer between vision-language models?, 2024 R Schaeffer, D Valentine, L Bailey, J Chua, C Eyzaguirre, Z Durante, ... URL https://arxiv. org/abs/2407.15211, 0	5
Looking inward: Language models can learn about themselves by introspection, 2024 FJ Binder, J Chua, T Korbak, H Sleight, J Hughes, R Long, E Perez, ... URL https://arxiv. org/abs/2410.13787, 0	5
Is model collapse inevitable? breaking the curse of recursion by accumulating real and synthetic data, 2024 M Gerstgrasser, R Schaeffer, A Dey, R Rafailov, H Sleight, J Hughes, ... URL https://arxiv. org/abs/2404.01413, 0	5
Best-of-n jailbreaking J Hughes, S Price, A Lynch, R Schaeffer, F Barez, S Koyejo, H Sleight, ... arXiv preprint arXiv:2412.03556, 2024	3	2024
Jailbreak Defense in a Narrow Domain: Limitations of Existing Methods and a New Transcript-Classifier Approach TT Wang, J Hughes, H Sleight, R Schaeffer, R Agrawal, F Barez, ... arXiv preprint arXiv:2412.02159, 2024	2	2024
Looking Inward: Language Models Can Learn About Themselves by Introspection FJ Binder, J Chua, T Korbak, H Sleight, J Hughes, R Long, E Perez, ... arXiv preprint arXiv:2410.13787, 2024	2	2024
Failures to Find Transferable Image Jailbreaks Between Vision-Language Models R Schaeffer, D Valentine, L Bailey, J Chua, C Eyzaguirre, Z Durante, ... The Thirteenth International Conference on Learning Representations, 0	1
Adaptive Deployment of Untrusted LLMs Reduces Distributed Threats J Wen, V Hebbar, C Larson, A Bhatt, A Radhakrishnan, M Sharma, ... arXiv preprint arXiv:2411.17693, 2024		2024
Rapid Response: Mitigating LLM Jailbreaks with a Few Examples A Peng, J Michael, H Sleight, E Perez, M Sharma arXiv preprint arXiv:2411.07494, 2024		2024
Attacking Audio Language Models with Best-of-N Jailbreaking J Hughes, S Price, A Lynch, R Schaeffer, F Barez, S Koyejo, H Sleight, ...
Plan B: Training LLMs to fail less severely J Stastny, N Warncke, D Xu, A Lynch, F Barez, H Sleight, E Perez
Jailbreak Defense in a Narrow Domain: Failures of existing methods and Improving Transcript-Based Classifiers TT Wang, J Hughes, H Sleight, R Schaeffer, R Agrawal, F Barez, ... The Third Workshop on New Frontiers in Adversarial Machine Learning, 0
Latent Adversarial Training Improves Robustness to Persistent Harmful Behaviors in LLMs A Ewart, A Sheshadri, PH Guo, A Lynch, C Wu, V Hebbar, H Sleight, ... Workshop on Socially Responsible Language Modelling Research, 0

The system can't perform the operation now. Try again later.

Articles 1–18