Follow
Abhay Sheshadri
Abhay Sheshadri
Verified email at gatech.edu
Title
Cited by
Cited by
Year
Latent Adversarial Training Improves Robustness to Persistent Harmful Behaviors in LLMs
A Sheshadri*, A Ewart*, P Guo*, A Lynch*, C Wu*, V Hebbar*, H Sleight, ...
arXiv e-prints, arXiv: 2407.15549, 2024
31*2024
A mechanistic analysis of a transformer trained on a symbolic multi-step reasoning task
J Brinkmann*, A Sheshadri*, V Levoso*, P Swoboda, C Bartelt
ACL 2024 (Findings), 2024
142024
Eliciting Language Model Behaviors using Reverse Language Models
J Pfau*, A Infanger*, A Sheshadri*, A Panda, J Michael, C Huebner
NeurIPS SOLAR Workshop, 2023
92023
Mechanistic Unlearning: Robust Knowledge Unlearning and Editing via Mechanistic Localization
P Guo*, A Syed*, A Sheshadri, A Ewart, GK Dziugaite
arXiv preprint arXiv:2410.12949, 2024
6*2024
Obfuscated Activations Bypass LLM Latent-Space Defenses
L Bailey*, A Serrano*, A Sheshadri*, M Seleznyov*, J Taylor*, E Jenner*, ...
arXiv preprint arXiv:2412.09565, 2024
2024
The system can't perform the operation now. Try again later.
Articles 1–5