Transferring fairness under distribution shifts via fair consistency regularization B An, Z Che, M Ding, F Huang Advances in Neural Information Processing Systems 35, 32582-32597, 2022 | 38 | 2022 |
Sail: Self-improving efficient online alignment of large language models M Ding, S Chakraborty, V Agrawal, Z Che, A Koppel, M Wang, A Bedi, ... arXiv preprint arXiv:2406.15567, 2024 | 7 | 2024 |
Can Watermarking Large Language Models Prevent Copyrighted Text Generation and Hide Training Data? MA Panaitescu-Liess, Z Che, B An, Y Xu, P Pathmanathan, S Chakraborty, ... arXiv preprint arXiv:2407.17417, 2024 | 4 | 2024 |
EnsemW2S: Can an Ensemble of LLMs be Leveraged to Obtain a Stronger LLM? A Agrawal, M Ding, Z Che, C Deng, A Satheesh, J Langford, F Huang arXiv preprint arXiv:2410.04571, 2024 | 2 | 2024 |
Model Tampering Attacks Enable More Rigorous Evaluations of LLM Capabilities Z Che, S Casper, R Kirk, A Satheesh, S Slocum, LE McKinney, ... arXiv preprint arXiv:2502.05209, 2025 | | 2025 |
Auction-Based Regulation for Artificial Intelligence M Bornstein, Z Che, S Julapalli, A Mohamed, AS Bedi, F Huang arXiv preprint arXiv:2410.01871, 2024 | | 2024 |
PoisonedParrot: Subtle Data Poisoning Attacks to Elicit Copyright-Infringing Content from Large Language Models MA Panaitescu-Liess, P Pathmanathan, Y Kaya, Z Che, B An, S Zhu, ... Neurips Safe Generative AI Workshop 2024, 2024 | | 2024 |
Model Manipulation Attacks Enable More Rigorous Evaluations of LLM Capabilities Z Che, S Casper, A Satheesh, R Gandikota, D Rosati, S Slocum, ... Neurips Safe Generative AI Workshop 2024, 2024 | | 2024 |