Javier Rando

Cited by

	All	Since 2020
Citations	1287	1286
h-index	12	12
i10-index	13	13

1100

550

275

825

202120222023202420254 7 141 1049 72

Co-authors

Florian TramèrAssistant Professor of Computer Science, ETH ZurichVerified email at inf.ethz.ch
Nicholas CarliniGoogle DeepMindVerified email at google.com
Daniel PalekaETH ZurichVerified email at inf.ethz.ch
Stephen CasperPhD student, MITVerified email at mit.edu
He HeNew York UniversityVerified email at cs.nyu.edu
Fernando Perez-CruzSr Adviser, Innovation at Bank for International SettlementsVerified email at bis.org

Javier Rando

Other namesJavier Rando Ramirez

PhD Student @ ETH Zurich

Verified email at ai.ethz.ch - Homepage

Artificial Intelligence Language Models Safety Security Privacy


Title Sort by citations Sort by year Sort by title	Cited by Cited by	Year
Open problems and fundamental limitations of reinforcement learning from human feedback S Casper, X Davies, C Shi, TK Gilbert, J Scheurer, J Rando, R Freedman, ... Transactions on Machine Learning Research (TMLR). Outstanding Finalist 🏆, 2023	449	2023
Scalable Extraction of Training Data from (Production) Language Models M Nasr, J Rando, N Carlini, J Hayase, M Jagielski, AF Cooper, ... International Conference on Learning Representations (ICLR), 2025	279	2025
Red-Teaming the Stable Diffusion Safety Filter J Rando, D Paleka, D Lindner, L Heim, F Tramèr ML Safety Workshop at NeurIPS. Best Paper Award 🏆, 2022	159	2022
Foundational challenges in assuring alignment and safety of large language models U Anwar, A Saparov, J Rando, D Paleka, M Turpin, P Hase, ES Lubana, ... Transactions on Machine Learning Research (TMLR), 2024	115	2024
Scalable and transferable black-box jailbreaks for language models via persona modulation R Shah, S Pour, A Tagade, S Casper, J Rando SoLaR Workshop at NeurIPS, 2023	88	2023
Universal Jailbreak Backdoors from Poisoned Human Feedback J Rando, F Tramèr International Conference on Learning Representations (ICLR), 2023	53	2023
"That Is a Suspicious Reaction!": Interpreting Logits Variation to Detect NLP Adversarial Attacks E Mosca, S Agarwal, J Rando-Ramirez, G Groh Annual Meeting of the Association for Computational Linguistics (ACL), 2022	31	2022
Personas as a Way to Model Truthfulness in Language Models N Joshi, J Rando, A Saparov, N Kim, H He Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023	21	2023
Competition report: Finding universal jailbreak backdoors in aligned llms J Rando, F Croce, K Mitka, S Shabalin, M Andriushchenko, N Flammarion, ... arXiv preprint arXiv:2404.14461, 2024	17*	2024
PassGPT: password modeling and (guided) generation with large language models J Rando, F Perez-Cruz, B Hitaj European Symposium on Research in Computer Security, 164-183, 2023	17	2023
An Adversarial Perspective on Machine Unlearning for AI Safety J Łucki, B Wei, Y Huang, P Henderson, F Tramèr, J Rando SoLaR Workshop at NeurIPS. Best Technical Paper 🏆, 2024	14*	2024
Attributions toward artificial agents in a modified Moral Turing Test E Aharoni, S Fernandes, DJ Brady, C Alexander, M Criner, K Queen, ... Scientific Reports 14 (1), 8458, 2024	13	2024
Uneven coverage of natural disasters in Wikipedia: The case of floods V Lorini, J Rando, D Saez-Trumper, C Castillo ISCRAM 2020, 2020	11	2020
Dataset and Lessons Learned from the 2024 SaTML LLM Capture-the-Flag Competition E Debenedetti, J Rando, D Paleka*, SF Florin, D Albastroiu, N Cohen, ... NeurIPS Datasets and Benchmarks. Spotlight 🏆, 2024	8	2024
Adversarial Perturbations Cannot Reliably Protect Artists From Generative AI R Hönig, J Rando, N Carlini, F Tramèr International Conference on Learning Representations (ICLR), 2024	5	2024
Llama guard 3 vision: Safeguarding human-ai image understanding conversations J Chi, U Karn, H Zhan, E Smith, J Rando, Y Zhang, K Plawiak, ZD Coudert, ... arXiv preprint arXiv:2411.10414, 2024	2	2024
The Worst (But Only) Claude 3 Tokenizer J Rando, F Tramèr https://javirando.com/blog/2024/claude-tokenizer/, 2024	2	2024
Exploring Adversarial Attacks and Defenses in Vision Transformers trained with DINO J Rando, N Naimi, T Baumann, M Mathys AdvML Frontiers Workshop at ICML, 2022	2	2022
Gradient-based jailbreak images for multimodal fusion models J Rando, H Korevaar, E Brinkman, I Evtimov, F Tramèr arXiv preprint arXiv:2410.03489, 2024	1	2024
Measuring Non-Adversarial Reproduction of Training Data in Large Language Models M Aerni, J Rando, E Debenedetti, N Carlini, D Ippolito, F Tramèr International Conference on Learning Representations (ICLR), 2024		2024

The system can't perform the operation now. Try again later.

Articles 1–20

Citations per year

Duplicate citations

Merged citations

Add co-authorsCo-authors

Follow

Cited by

Co-authors