Follow
Zidi Xiong
Zidi Xiong
Verified email at g.harvard.edu - Homepage
Title
Cited by
Cited by
Year
DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models.
B Wang, W Chen, H Pei, C Xie, M Kang, C Zhang, C Xu, Z Xiong, R Dutta, ...
NeurIPS, 2023
3832023
Badchain: Backdoor chain-of-thought prompting for large language models
Z Xiang, F Jiang, Z Xiong, B Ramasubramanian, R Poovendran, B Li
arXiv preprint arXiv:2401.12242, 2024
572024
Rigorllm: Resilient guardrails for large language models against undesired content
Z Yuan, Z Xiong, Y Zeng, N Yu, R Jia, D Song, B Li
arXiv preprint arXiv:2403.13031, 2024
302024
Umd: Unsupervised model detection for x2x backdoor attacks
Z Xiang, Z Xiong, B Li
International Conference on Machine Learning, 38013-38038, 2023
152023
GuardAgent: Safeguard LLM Agents by a Guard Agent via Knowledge-Enabled Reasoning
Z Xiang, L Zheng, Y Li, J Hong, Q Li, H Xie, J Zhang, Z Xiong, C Xie, ...
arXiv preprint arXiv:2406.09187, 2024
102024
CBD: A certified backdoor detector based on local dominant probability
Z Xiang, Z Xiong, B Li
Advances in Neural Information Processing Systems 36, 2024
92024
Label-smoothed backdoor attack
M Peng, Z Xiong, M Sun, P Li
arXiv preprint arXiv:2202.11203, 2022
92022
DecodingTrust: A comprehensive assessment of trustworthiness in GPT models. arXiv
B Wang, W Chen, H Pei, C Xie, M Kang, C Zhang, C Xu, Z Xiong, R Dutta, ...
arXiv preprint arXiv:2306.11698, 2024
82024
DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models (2023)
B Wang, W Chen, H Pei, C Xie, M Kang, C Zhang, C Xu, Z Xiong, R Dutta, ...
Cited on, 28, 0
7
Backdoor chain-of-thought prompting for large language models
Z Xiang, F Jiang, Z Xiong, B Ramasubramanian, R Poovendran, BB Li
NeurIPS Workshops, 2023
52023
Rethinking the Necessity of Labels in Backdoor Removal
Z Xiong, D Wu, Y Wang, Y Wang
ICLR 2023 Workshop on Backdoor Attacks and Defenses in Machine Learning, 2023
12023
The system can't perform the operation now. Try again later.
Articles 1–11