Takip et
Javier Rando
Javier Rando
Diğer adlarJavier Rando Ramirez
PhD Student @ ETH Zurich
ai.ethz.ch üzerinde doğrulanmış e-posta adresine sahip - Ana Sayfa
Başlık
Alıntı yapanlar
Alıntı yapanlar
Yıl
Open problems and fundamental limitations of reinforcement learning from human feedback
S Casper, X Davies, C Shi, TK Gilbert, J Scheurer, J Rando, R Freedman, ...
Transactions on Machine Learning Research (TMLR). Outstanding Finalist 🏆, 2023
4672023
Scalable Extraction of Training Data from (Production) Language Models
M Nasr*, J Rando*, N Carlini, J Hayase, M Jagielski, AF Cooper, ...
International Conference on Learning Representations (ICLR), 2025
2892025
Red-Teaming the Stable Diffusion Safety Filter
J Rando, D Paleka, D Lindner, L Heim, F Tramèr
ML Safety Workshop at NeurIPS. Best Paper Award 🏆, 2022
1672022
Foundational challenges in assuring alignment and safety of large language models
U Anwar, A Saparov, J Rando, D Paleka, M Turpin, P Hase, ES Lubana, ...
Transactions on Machine Learning Research (TMLR), 2024
1242024
Scalable and transferable black-box jailbreaks for language models via persona modulation
R Shah, S Pour, A Tagade, S Casper, J Rando
SoLaR Workshop at NeurIPS, 2023
932023
Universal Jailbreak Backdoors from Poisoned Human Feedback
J Rando, F Tramèr
International Conference on Learning Representations (ICLR), 2023
562023
"That Is a Suspicious Reaction!": Interpreting Logits Variation to Detect NLP Adversarial Attacks
E Mosca, S Agarwal, J Rando-Ramirez, G Groh
Annual Meeting of the Association for Computational Linguistics (ACL), 2022
312022
Personas as a Way to Model Truthfulness in Language Models
N Joshi*, J Rando*, A Saparov, N Kim, H He
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
212023
An Adversarial Perspective on Machine Unlearning for AI Safety
J Łucki, B Wei, Y Huang, P Henderson, F Tramèr, J Rando
SoLaR Workshop at NeurIPS. Best Technical Paper 🏆, 2024
17*2024
Competition report: Finding universal jailbreak backdoors in aligned llms
J Rando, F Croce, K Mitka, S Shabalin, M Andriushchenko, N Flammarion, ...
arXiv preprint arXiv:2404.14461, 2024
17*2024
PassGPT: password modeling and (guided) generation with large language models
J Rando, F Perez-Cruz, B Hitaj
European Symposium on Research in Computer Security, 164-183, 2023
172023
Attributions toward artificial agents in a modified Moral Turing Test
E Aharoni, S Fernandes, DJ Brady, C Alexander, M Criner, K Queen, ...
Scientific Reports 14 (1), 8458, 2024
152024
Uneven coverage of natural disasters in Wikipedia: The case of floods
V Lorini, J Rando, D Saez-Trumper, C Castillo
ISCRAM 2020, 2020
112020
Dataset and Lessons Learned from the 2024 SaTML LLM Capture-the-Flag Competition
E Debenedetti*, J Rando*, D Paleka*, SF Florin, D Albastroiu, N Cohen, ...
NeurIPS Datasets and Benchmarks. Spotlight 🏆, 2024
92024
Llama guard 3 vision: Safeguarding human-ai image understanding conversations
J Chi, U Karn, H Zhan, E Smith, J Rando, Y Zhang, K Plawiak, ZD Coudert, ...
arXiv preprint arXiv:2411.10414, 2024
82024
Adversarial Perturbations Cannot Reliably Protect Artists From Generative AI
R Hönig, J Rando, N Carlini, F Tramèr
International Conference on Learning Representations (ICLR). Spotlight 🏆, 2024
72024
Exploring Adversarial Attacks and Defenses in Vision Transformers trained with DINO
J Rando, N Naimi, T Baumann, M Mathys
AdvML Frontiers Workshop at ICML, 2022
32022
Gradient-based jailbreak images for multimodal fusion models
J Rando, H Korevaar, E Brinkman, I Evtimov, F Tramèr
arXiv preprint arXiv:2410.03489, 2024
22024
The Worst (But Only) Claude 3 Tokenizer
J Rando, F Tramèr
https://javirando.com/blog/2024/claude-tokenizer/, 2024
22024
Persistent Pre-Training Poisoning of LLMs
Y Zhang*, J Rando*, I Evtimov, J Chi, EM Smith, N Carlini, F Tramèr, ...
International Conference on Learning Representations (ICLR), 2024
12024
Sistem, işlemi şu anda gerçekleştiremiyor. Daha sonra yeniden deneyin.
Makaleler 1–20