Suivre
Sicheng Zhu
Titre
Citée par
Citée par
Année
AutoDAN: Interpretable Gradient-Based Adversarial Attacks on Large Language Models
S Zhu, R Zhang, B An, G Wu, J Barrow, Z Wang, F Huang, A Nenkova, ...
First Conference on Language Modeling (COLM) 2024, 2024
183*2024
Position: On the Possibilities of AI-Generated Text Detection
S Chakraborty, A Bedi, S Zhu, B An, D Manocha, F Huang
Forty-first International Conference on Machine Learning (ICML) 2024, 2024
130*2024
WAVES: Benchmarking the Robustness of Image Watermarks
T Rabbani, B An, M Ding, A Agrawal, Y Xu, C Deng, S Zhu, A Mohamed, ...
International Conference on Machine Learning (ICML) 2024, 2024
31*2024
Understanding the generalization benefit of model invariance from a data perspective
S Zhu, B An, F Huang
Advances in Neural Information Processing Systems (NeurIPS) 2021 34, 4328-4341, 2021
282021
Learning adversarially robust representations via worst-case mutual information maximization
S Zhu, X Zhang, D Evans
International Conference on Machine Learning (ICML) 2020, 11609-11618, 2020
272020
More Context, Less Distraction: Zero-shot Visual Classification by Inferring and Conditioning on Contextual Attributes
B An, S Zhu, MA Panaitescu-Liess, CK Mummadi, F Huang
The Twelfth International Conference on Learning Representations (ICLR) 2024, 2024
19*2024
Automatic pseudo-harmful prompt generation for evaluating false refusals in large language models
B An, S Zhu, R Zhang, MA Panaitescu-Liess, Y Xu, F Huang
First Conference on Language Modeling (COLM) 2024, 2024
112024
Can Watermarking Large Language Models Prevent Copyrighted Text Generation and Hide Training Data?
MA Panaitescu-Liess, Z Che, B An, Y Xu, P Pathmanathan, S Chakraborty, ...
arXiv preprint arXiv:2407.17417, 2024
6*2024
Learning Unforeseen Robustness from Out-of-distribution Data Using Equivariant Domain Translator
S Zhu, B An, F Huang, S Hong
International Conference on Machine Learning (ICML) 2023, 2023
12023
AdvPrefix: An Objective for Nuanced LLM Jailbreaks
S Zhu, B Amos, Y Tian, C Guo, I Evtimov
arXiv preprint arXiv:2412.10321, 2024
2024
GenARM: Reward Guided Generation with Autoregressive Reward Model for Test-time Alignment
Y Xu, UM Sehwag, A Koppel, S Zhu, B An, F Huang, S Ganesh
arXiv preprint arXiv:2410.08193, 2024
2024
PoisonedParrot: Subtle Data Poisoning Attacks to Elicit Copyright-Infringing Content from Large Language Models
MA Panaitescu-Liess, P Pathmanathan, Y Kaya, Z Che, B An, S Zhu, ...
Neurips Safe Generative AI Workshop 2024, 2024
2024
Like Oil and Water: Group Robustness Methods and Poisoning Defenses Don’t Mix
MA Panaitescu-Liess, Y Kaya, S Zhu, F Huang, T Dumitras
The Twelfth International Conference on Learning Representations (ICLR) 2024, 2024
2024
Le système ne peut pas réaliser cette opération maintenant. Veuillez réessayer plus tard.
Articles 1–13