Следене
Aengus Lynch
Aengus Lynch
Потвърден имейл адрес: ucl.ac.uk - Начална страница
Заглавие
Позовавания
Позовавания
Година
Towards automated circuit discovery for mechanistic interpretability
A Conmy, A Mavor-Parker, A Lynch, S Heimersheim, A Garriga-Alonso
Advances in Neural Information Processing Systems 36, 16318-16352, 2023
2552023
Causal machine learning: A survey and open problems
J Kaddour, A Lynch, Q Liu, MJ Kusner, R Silva
arXiv preprint arXiv:2206.15475, 2022
2022022
Eight methods to evaluate robust unlearning in llms
A Lynch, P Guo, A Ewart, S Casper, D Hadfield-Menell
arXiv preprint arXiv:2402.16835, 2024
502024
Targeted latent adversarial training improves robustness to persistent harmful behaviors in llms
A Sheshadri, A Ewart, P Guo, A Lynch, C Wu, V Hebbar, H Sleight, ...
arXiv e-prints, arXiv: 2407.15549, 2024
32*2024
Spawrious: A benchmark for fine control of spurious correlation biases
A Lynch, GJS Dovonon, J Kaddour, R Silva
arXiv preprint arXiv:2303.05470, 2023
29*2023
Analysing the generalisation and reliability of steering vectors
D Tan, D Chanin, A Lynch, B Paige, D Kanoulas, A Garriga-Alonso, R Kirk
Advances in Neural Information Processing Systems 37, 139179-139212, 2025
92025
Best-of-N Jailbreaking
J Hughes, S Price, A Lynch, R Schaeffer, F Barez, S Koyejo, H Sleight, ...
arXiv preprint arXiv:2412.03556, 2024
6*2024
Evaluating the impact of geometric and statistical skews on out-of-distribution generalization performance
A Lynch, J Kaddour, R Silva
NeurIPS 2022 Workshop on Distribution Shifts: Connecting Methods and …, 2022
52022
H-Space Sparse Autoencoders
A Ijishakin, ML Ang, L Baljer, DCH Tan, HL Fry, A Abdulaal, A Lynch, ...
Neurips Safe Generative AI Workshop 2024, 2024
12024
How Do Large Language Monkeys Get Their Power (Laws)?
R Schaeffer, J Kazdan, J Hughes, J Juravsky, S Price, A Lynch, E Jones, ...
arXiv preprint arXiv:2502.17578, 2025
2025
Plan B: Training LLMs to fail less severely
J Stastny, N Warncke, D Xu, A Lynch, F Barez, H Sleight, E Perez
2024
Системата не може да изпълни операцията сега. Опитайте отново по-късно.
Статии 1–11