Folgen
Xinyue Shen
Xinyue Shen
CISPA Helmholtz Center for Information Security
Bestätigte E-Mail-Adresse bei cispa.de - Startseite
Titel
Zitiert von
Zitiert von
Jahr
"Do Anything Now": Characterizing and evaluating in-the-wild jailbreak prompts on large language models
X Shen, Z Chen, M Backes, Y Shen, Y Zhang
Proceedings of the 2024 on ACM SIGSAC Conference on Computer and …, 2024
4252024
In ChatGPT We Trust? Measuring and Characterizing the Reliability of ChatGPT
X Shen, Z Chen, M Backes, Y Zhang
arXiv preprint arXiv:2304.08979, 2023
1362023
MGTBench: Benchmarking Machine-Generated Text Detection
X He, X Shen, Z Chen, M Backes, Y Zhang
Proceedings of the 2024 on ACM SIGSAC Conference on Computer and …, 2024
962024
Unsafe diffusion: On the generation of unsafe images and hateful memes from text-to-image models
Y Qu, X Shen, X He, M Backes, S Zannettou, Y Zhang
Proceedings of the 2023 ACM SIGSAC Conference on Computer and Communications …, 2023
962023
Evil Under the Sun: Understanding and Discovering Attacks on Ethereum Decentralized Applications
L Su, X Shen, X Du, X Liao, XF Wang, L Xing, B Liu
722021
Comprehensive Assessment of Jailbreak Attacks Against LLMs
J Chu, Y Liu, Z Yang, X Shen, M Backes, Y Zhang
arXiv preprint arXiv:2402.05668, 2024
592024
Prompt Stealing Attacks Against Text-to-Image Generation Models
X Shen, Y Qu, M Backes, Y Zhang
33rd USENIX Security Symposium (USENIX Security 24), 5823-5840, 2024
252024
On Xing Tian and the Perseverance of Anti-China Sentiment Online
X Shen, X He, M Backes, J Blackburn, S Zannettou, Y Zhang
Proceedings of the International AAAI Conference on Web and Social Media 16 …, 2022
212022
UnsafeBench: Benchmarking Image Safety Classifiers on Real-World and AI-Generated Images
Y Qu, X Shen, Y Wu, M Backes, S Zannettou, Y Zhang
arXiv preprint arXiv:2405.03486, 2024
112024
Backdoor Attacks in the Supply Chain of Masked Image Modeling
X Shen, X He, Z Li, Y Shen, M Backes, Y Zhang
arXiv preprint arXiv:2210.01632, 2022
92022
Voice Jailbreak Attacks Against GPT-4o
X Shen, Y Wu, M Backes, Y Zhang
arXiv preprint arXiv:2405.19103, 2024
62024
Comprehensive Assessment of Toxicity in ChatGPT
B Zhang, X Shen, WM Si, Z Sha, Z Chen, A Salem, Y Shen, M Backes, ...
arXiv preprint arXiv:2311.14685, 2023
52023
Games and Beyond: Analyzing the Bullet Chats of Esports Livestreaming
Y Jiang, X Shen, R Wen, Z Sha, J Chu, Y Liu, M Backes, Y Zhang
Proceedings of the International AAAI Conference on Web and Social Media 18 …, 2024
32024
ModSCAN: Measuring Stereotypical Bias in Large Vision-Language Models from Vision and Language Modalities
Y Jiang, Z Li, X Shen, Y Liu, M Backes, Y Zhang
Proceedings of the 2024 Conference on Empirical Methods in Natural Language …, 2024
2*2024
HateBench: Benchmarking Hate Speech Detectors on LLM-Generated Content and Hate Campaigns
X Shen, Y Wu, Y Qu, M Backes, S Zannettou, Y Zhang
34th USENIX Security Symposium (USENIX Security 25), 2025
12025
Are We in the AI-Generated Text World Already? Quantifying and Monitoring AIGT on Social Media
Z Sun, Z Zhang, X Shen, Z Zhang, Y Liu, M Backes, Y Zhang, X He
arXiv preprint arXiv:2412.18148, 2024
2024
The Death and Life of Great Prompts: Analyzing the Evolution of LLM Prompts from the Structural Perspective
Y Ma, X Shen, Y Wu, B Zhang, M Backes, Y Zhang
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
2024
Das System kann den Vorgang jetzt nicht ausführen. Versuchen Sie es später erneut.
Artikel 1–17