Javier Rando

Alıntı yapanlar

	Hepsi	2020 yılından bugüne
Alıntılar	1357	1356
h-endeksi	12	12
i10-endeksi	13	13

1100

550

275

825

202120222023202420254 7 146 1041 146

Katkıda bulunan yazarlar

Florian TramèrAssistant Professor of Computer Science, ETH Zurichinf.ethz.ch üzerinde doğrulanmış e-posta adresine sahip
Nicholas CarliniGoogle DeepMindgoogle.com üzerinde doğrulanmış e-posta adresine sahip
Daniel PalekaETH Zurichinf.ethz.ch üzerinde doğrulanmış e-posta adresine sahip
Stephen CasperPhD student, MITmit.edu üzerinde doğrulanmış e-posta adresine sahip
He HeNew York Universitycs.nyu.edu üzerinde doğrulanmış e-posta adresine sahip
Fernando Perez-CruzSr Adviser, Innovation at Bank for International Settlementsbis.org üzerinde doğrulanmış e-posta adresine sahip

Takip et

Javier Rando

Diğer adlarJavier Rando Ramirez

PhD Student @ ETH Zurich

ai.ethz.ch üzerinde doğrulanmış e-posta adresine sahip - Ana Sayfa

Artificial Intelligence Language Models Safety Security Privacy


Başlık Alıntılara göre sırala Yıla göre sırala Başlığa göre sırala	Alıntı yapanlar Alıntı yapanlar	Yıl
Open problems and fundamental limitations of reinforcement learning from human feedback S Casper, X Davies, C Shi, TK Gilbert, J Scheurer, J Rando, R Freedman, ... Transactions on Machine Learning Research (TMLR). Outstanding Finalist 🏆, 2023	467	2023
Scalable Extraction of Training Data from (Production) Language Models M Nasr, J Rando, N Carlini, J Hayase, M Jagielski, AF Cooper, ... International Conference on Learning Representations (ICLR), 2025	289	2025
Red-Teaming the Stable Diffusion Safety Filter J Rando, D Paleka, D Lindner, L Heim, F Tramèr ML Safety Workshop at NeurIPS. Best Paper Award 🏆, 2022	167	2022
Foundational challenges in assuring alignment and safety of large language models U Anwar, A Saparov, J Rando, D Paleka, M Turpin, P Hase, ES Lubana, ... Transactions on Machine Learning Research (TMLR), 2024	124	2024
Scalable and transferable black-box jailbreaks for language models via persona modulation R Shah, S Pour, A Tagade, S Casper, J Rando SoLaR Workshop at NeurIPS, 2023	93	2023
Universal Jailbreak Backdoors from Poisoned Human Feedback J Rando, F Tramèr International Conference on Learning Representations (ICLR), 2023	56	2023
"That Is a Suspicious Reaction!": Interpreting Logits Variation to Detect NLP Adversarial Attacks E Mosca, S Agarwal, J Rando-Ramirez, G Groh Annual Meeting of the Association for Computational Linguistics (ACL), 2022	31	2022
Personas as a Way to Model Truthfulness in Language Models N Joshi, J Rando, A Saparov, N Kim, H He Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023	21	2023
An Adversarial Perspective on Machine Unlearning for AI Safety J Łucki, B Wei, Y Huang, P Henderson, F Tramèr, J Rando SoLaR Workshop at NeurIPS. Best Technical Paper 🏆, 2024	17*	2024
Competition report: Finding universal jailbreak backdoors in aligned llms J Rando, F Croce, K Mitka, S Shabalin, M Andriushchenko, N Flammarion, ... arXiv preprint arXiv:2404.14461, 2024	17*	2024
PassGPT: password modeling and (guided) generation with large language models J Rando, F Perez-Cruz, B Hitaj European Symposium on Research in Computer Security, 164-183, 2023	17	2023
Attributions toward artificial agents in a modified Moral Turing Test E Aharoni, S Fernandes, DJ Brady, C Alexander, M Criner, K Queen, ... Scientific Reports 14 (1), 8458, 2024	15	2024
Uneven coverage of natural disasters in Wikipedia: The case of floods V Lorini, J Rando, D Saez-Trumper, C Castillo ISCRAM 2020, 2020	11	2020
Dataset and Lessons Learned from the 2024 SaTML LLM Capture-the-Flag Competition E Debenedetti, J Rando, D Paleka*, SF Florin, D Albastroiu, N Cohen, ... NeurIPS Datasets and Benchmarks. Spotlight 🏆, 2024	9	2024
Llama guard 3 vision: Safeguarding human-ai image understanding conversations J Chi, U Karn, H Zhan, E Smith, J Rando, Y Zhang, K Plawiak, ZD Coudert, ... arXiv preprint arXiv:2411.10414, 2024	8	2024
Adversarial Perturbations Cannot Reliably Protect Artists From Generative AI R Hönig, J Rando, N Carlini, F Tramèr International Conference on Learning Representations (ICLR). Spotlight 🏆, 2024	7	2024
Exploring Adversarial Attacks and Defenses in Vision Transformers trained with DINO J Rando, N Naimi, T Baumann, M Mathys AdvML Frontiers Workshop at ICML, 2022	3	2022
Gradient-based jailbreak images for multimodal fusion models J Rando, H Korevaar, E Brinkman, I Evtimov, F Tramèr arXiv preprint arXiv:2410.03489, 2024	2	2024
The Worst (But Only) Claude 3 Tokenizer J Rando, F Tramèr https://javirando.com/blog/2024/claude-tokenizer/, 2024	2	2024
Persistent Pre-Training Poisoning of LLMs Y Zhang, J Rando, I Evtimov, J Chi, EM Smith, N Carlini, F Tramèr, ... International Conference on Learning Representations (ICLR), 2024	1	2024

Sistem, işlemi şu anda gerçekleştiremiyor. Daha sonra yeniden deneyin.

Makaleler 1–20

Yıllık alıntı sayısı

Mükerrer alıntılar

Birleştirilmiş alıntılar

Katkıda bulunan yazar ekleKatkıda bulunan yazarlar

Takip et

Alıntı yapanlar

Katkıda bulunan yazarlar