‪Alexandra Souly‬ - ‪Academic Search‬

Utwórz swój profil

Cytowane przez

	Wszystkie	Od 2020
Cytowania	118	118
h-indeks	4	4
i10-indeks	3	3

0

100

50

25

75

20222023202420252 10 91 15

Alexandra Souly

Alexandra Souly

Zweryfikowany adres z ucl.ac.uk

AI Safety & ML Security


Tytuł Sortuj wg cytatów Sortuj wg roku Sortuj wg tytułu	Cytowane przez Cytowane przez	Rok
A StrongREJECT for Empty Jailbreaks A Souly, Q Lu, D Bowen, T Trinh, E Hsieh, S Pandey, P Abbeel, ... arXiv preprint arXiv:2402.10260, 2024	51*	2024
JaxMARL: Multi-Agent RL Environments and Algorithms in JAX A Rutherford, B Ellis, M Gallici, J Cook, A Lupu, G Ingvarsson, T Willi, ... Proceedings of the 23rd International Conference on Autonomous Agents and …, 2024	46*	2024
Retrospective on the 2021 MineRL BASALT Competition on Learning from Human Feedback R Shah, SH Wang, C Wild, S Milani, A Kanervisto, VG Goecks, ... NeurIPS 2021 Competitions and Demonstrations Track, 259-272, 2022	11*	2022
AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents M Andriushchenko, A Souly, M Dziemian, D Duenas, M Lin, J Wang, ... arXiv preprint arXiv:2410.09024, 2024	7	2024
Leading the Pack: N-player Opponent Shaping A Souly, T Willi, A Khan, R Kirk, C Lu, E Grefenstette, T Rocktäschel arXiv preprint arXiv:2312.12564, 2023	3	2023
How to Evaluate Jailbreak Methods: A Case Study With the StrongREJECT Benchmark The paper in question claimed an impressive 43% success rate in jailbreaking GPT-4 by … D Bowen, S Emmons, A Souly, Q Lu, T Trinh, E Hsieh, S Pandey, ...

Nie można teraz wykonać tej operacji. Spróbuj ponownie później.

Prace 1–6