Jailbreaking black box large language models in twenty queries P Chao, A Robey, E Dobriban, H Hassani, GJ Pappas, E Wong arXiv preprint arXiv:2310.08419, 2023 | 428 | 2023 |
Gpt-4o system card A Hurst, A Lerer, AP Goucher, A Perelman, A Ramesh, A Clark, AJ Ostrow, ... arXiv preprint arXiv:2410.21276, 2024 | 128 | 2024 |
Adversarial prompting for black box foundation models N Maus, P Chao, E Wong, J Gardner arXiv preprint arXiv:2302.04237 1 (2), 2023 | 101* | 2023 |
Jailbreakbench: An open robustness benchmark for jailbreaking large language models P Chao, E Debenedetti, A Robey, M Andriushchenko, F Croce, V Sehwag, ... arXiv preprint arXiv:2404.01318, 2024 | 93 | 2024 |
A safe harbor for ai evaluation and red teaming S Longpre, S Kapoor, K Klyman, A Ramaswami, R Bommasani, ... arXiv preprint arXiv:2403.04893, 2024 | 29 | 2024 |
Openai o1 system card A Jaech, A Kalai, A Lerer, A Richardson, A El-Kishky, A Low, A Helyar, ... arXiv preprint arXiv:2412.16720, 2024 | 21 | 2024 |
Interventional and counterfactual inference with diffusion models P Chao, P Blöbaum, SP Kasiviswanathan arXiv preprint arXiv:2302.00860 4, 16, 2023 | 19 | 2023 |
Jailbreaking black box large language models in twenty queries, 2024 P Chao, A Robey, E Dobriban, H Hassani, GJ Pappas, E Wong URL https://arxiv. org/abs/2310.08419, 0 | 11 | |
Jailbreaking black box large language models in twenty queries. arXiv 2023 P Chao, A Robey, E Dobriban, H Hassani, GJ Pappas, E Wong arXiv preprint arXiv:2310.08419, 2023 | 8 | 2023 |
AdaPT-GMM: Powerful and robust covariate-assisted multiple testing P Chao, W Fithian arXiv preprint arXiv:2106.15812, 2021 | 7 | 2021 |
Different definitions of conic sections in hyperbolic geometry P Chao, J Rosenberg Involve, a Journal of Mathematics 11 (5), 753-768, 2018 | 7 | 2018 |
Statistical Estimation Under Distribution Shift: Wasserstein Perturbations and Minimax Theory P Chao, E Dobriban arXiv preprint arXiv:2308.01853, 2023 | 4 | 2023 |
Generative models for pose transfer P Chao, A Li, G Swamy arXiv preprint arXiv:1806.09070, 2018 | 3 | 2018 |
Watermarking Language Models with Error Correcting Codes P Chao, E Dobriban, H Hassani arXiv preprint arXiv:2406.10281, 2024 | 2 | 2024 |
Position: A Safe Harbor for AI Evaluation and Red Teaming S Longpre, S Kapoor, K Klyman, A Ramaswami, R Bommasani, ... Forty-first International Conference on Machine Learning, 2023 | 1 | 2023 |
Adversarial Robustness for Estimation and Alignment P Chao University of Pennsylvania, 2024 | | 2024 |
Modeling Causal Mechanisms with Diffusion Models for Interventional and Counterfactual Queries P Chao, P Blöbaum, SK Patel, S Kasiviswanathan Transactions on Machine Learning Research, 0 | | |