Javier Rando

Navedeno

	Vse	Od leta 2020
Navedbe	1486	1485
indeks h	12	12
indeks i10	15	15

1100

550

275

825

202120222023202420254 8 161 1034 266

Soavtorji

Florian TramèrAssistant Professor of Computer Science, ETH ZurichPreverjeni e-poštni naslov na inf.ethz.ch
Nicholas CarliniGoogle DeepMindPreverjeni e-poštni naslov na google.com
Daniel PalekaETH ZurichPreverjeni e-poštni naslov na inf.ethz.ch
Stephen CasperPhD student, MITPreverjeni e-poštni naslov na mit.edu
He HeNew York UniversityPreverjeni e-poštni naslov na cs.nyu.edu
Fernando Perez-CruzSr Adviser, Innovation at Bank for International SettlementsPreverjeni e-poštni naslov na bis.org

Spremljaj

Javier Rando

Druga imenaJavier Rando Ramirez

PhD Student @ ETH Zurich

Preverjeni e-poštni naslov na ai.ethz.ch - Domača stran

Artificial Intelligence Language Models Safety Security Privacy


Naslov Razvrsti po navedbah Razvrsti po letniku Razvrsti po naslovu	Navedeno Navedeno	Leto
Open problems and fundamental limitations of reinforcement learning from human feedback S Casper, X Davies, C Shi, TK Gilbert, J Scheurer, J Rando, R Freedman, ... Transactions on Machine Learning Research (TMLR). Outstanding Finalist 🏆, 2023	499	2023
Scalable Extraction of Training Data from Aligned, Production Language Models M Nasr, J Rando, N Carlini, J Hayase, M Jagielski, AF Cooper, ... International Conference on Learning Representations (ICLR), 2025	319*	2025
Red-Teaming the Stable Diffusion Safety Filter J Rando, D Paleka, D Lindner, L Heim, F Tramèr ML Safety Workshop at NeurIPS. Best Paper Award 🏆, 2022	175	2022
Foundational challenges in assuring alignment and safety of large language models U Anwar, A Saparov, J Rando, D Paleka, M Turpin, P Hase, ES Lubana, ... Transactions on Machine Learning Research (TMLR), 2024	138	2024
Scalable and transferable black-box jailbreaks for language models via persona modulation R Shah, S Pour, A Tagade, S Casper, J Rando SoLaR Workshop at NeurIPS, 2023	112	2023
Universal Jailbreak Backdoors from Poisoned Human Feedback J Rando, F Tramèr International Conference on Learning Representations (ICLR), 2023	63	2023
"That Is a Suspicious Reaction!": Interpreting Logits Variation to Detect NLP Adversarial Attacks E Mosca, S Agarwal, J Rando-Ramirez, G Groh Annual Meeting of the Association for Computational Linguistics (ACL), 2022	31	2022
Personas as a Way to Model Truthfulness in Language Models N Joshi, J Rando, A Saparov, N Kim, H He Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023	26	2023
An Adversarial Perspective on Machine Unlearning for AI Safety J Łucki, B Wei, Y Huang, P Henderson, F Tramèr, J Rando SoLaR Workshop at NeurIPS. Best Technical Paper 🏆, 2024	24*	2024
Attributions toward artificial agents in a modified Moral Turing Test E Aharoni, S Fernandes, DJ Brady, C Alexander, M Criner, K Queen, ... Scientific reports 14 (1), 8458, 2024	17	2024
Competition report: Finding universal jailbreak backdoors in aligned llms J Rando, F Croce, K Mitka, S Shabalin, M Andriushchenko, N Flammarion, ... arXiv preprint arXiv:2404.14461, 2024	17*	2024
Passgpt: Password modeling and (guided) generation with large language models J Rando, F Perez-Cruz, B Hitaj European Symposium on Research in Computer Security, 164-183, 2023	16	2023
Dataset and Lessons Learned from the 2024 SaTML LLM Capture-the-Flag Competition E Debenedetti, J Rando, D Paleka*, SF Florin, D Albastroiu, N Cohen, ... NeurIPS Datasets and Benchmarks. Spotlight 🏆, 2024	12	2024
Uneven coverage of natural disasters in Wikipedia: The case of floods V Lorini, J Rando, D Saez-Trumper, C Castillo ISCRAM 2020, 2020	11	2020
Llama guard 3 vision: Safeguarding human-ai image understanding conversations J Chi, U Karn, H Zhan, E Smith, J Rando, Y Zhang, K Plawiak, ZD Coudert, ... arXiv preprint arXiv:2411.10414, 2024	10	2024
Adversarial Perturbations Cannot Reliably Protect Artists From Generative AI R Hönig, J Rando, N Carlini, F Tramèr International Conference on Learning Representations (ICLR). Spotlight 🏆, 2024	7	2024
Exploring Adversarial Attacks and Defenses in Vision Transformers trained with DINO J Rando, N Naimi, T Baumann, M Mathys AdvML Frontiers Workshop at ICML, 2022	4	2022
Gradient-based jailbreak images for multimodal fusion models J Rando, H Korevaar, E Brinkman, I Evtimov, F Tramèr arXiv preprint arXiv:2410.03489, 2024	3	2024
Persistent Pre-Training Poisoning of LLMs Y Zhang, J Rando, I Evtimov, J Chi, EM Smith, N Carlini, F Tramèr, ... International Conference on Learning Representations (ICLR), 2024	2	2024
Adversarial ML Problems Are Getting Harder to Solve and to Evaluate J Rando, J Zhang, N Carlini, F Tramèr arXiv preprint arXiv:2502.02260, 2025		2025

Sistem trenutno ne more izvesti postopka. Poskusite znova pozneje.

Članki 1–20

Št. navedb na leto

Podvojene navedbe

Združene navedbe

Dodajanje soavtorjevSoavtorji

Spremljaj

Navedeno

Soavtorji