متابعة
Aidan O'Gara
Aidan O'Gara
بريد إلكتروني تم التحقق منه على usc.edu
عنوان
عدد مرات الاقتباسات
عدد مرات الاقتباسات
السنة
Ai alignment: A comprehensive survey
J Ji, T Qiu, B Chen, B Zhang, H Lou, K Wang, Y Duan, Z He, J Zhou, ...
arXiv preprint arXiv:2310.19852, 2023
2262023
AI deception: A survey of examples, risks, and potential solutions
PS Park, S Goldstein, A O’Gara, M Chen, D Hendrycks
Patterns 5 (5), 2024
1612024
Hoodwinked: Deception and cooperation in a text-based game for language models
A O'Gara
arXiv preprint arXiv:2308.01404, 2023
262023
AI deception: A survey of examples, risks, and potential solutions. arXiv
PS Park, S Goldstein, A O’Gara, M Chen, D Hendrycks
URL: http://arxiv. org/abs/2308.14752, 2023
92023
AI Deception: A Survey of Examples
PS Park, S Goldstein, A O’Gara, M Chen, D Hendrycks
Risks, and Potential Solutions. arXiv, 1-30, 2023
62023
Ai alignment: A comprehensive survey. arXiv
J Ji, T Qiu, B Chen, B Zhang, H Lou, K Wang, Y Duan, Z He, J Zhou, ...
arXiv preprint arXiv:2310.19852, 2023
62023
Open Problems in Machine Unlearning for AI Safety
F Barez, T Fu, A Prabhu, S Casper, A Sanyal, A Bibi, A O'Gara, R Kirk, ...
arXiv preprint arXiv:2501.04952, 2025
12025
Robustness Evaluation of Proxy Models against Adversarial Optimization
A Zou, L Phan, N Li, JS Chan, M Mazeika, A O'Gara, S Basart, J Ng, ...
يتعذر على النظام إجراء العملية في الوقت الحالي. عاود المحاولة لاحقًا.
مقالات 1–8