Zifan Wang

อ้างโดย

	ทั้งหมด	ตั้งแต่ปี 2020
การอ้างอิง	3611	3605
ดัชนี h	15	15
ดัชนี i10	15	15

2000

1000

500

1500

20202021202220232024202531 191 334 665 1984 392

การเข้าถึงแบบสาธารณะ

ดูทั้งหมด

7 บทความ

1 บทความ

ใช้งานได้

ใช้ไม่ได้

อิงตามข้อกำหนดในการรับเงินสนับสนุน

ติดตาม

Zifan Wang

ScaleAI

ยืนยันอีเมลแล้วที่ scale.com - หน้าแรก

Machine Learning Adversarial Robustness AI Safety


ชื่อ เรียงตามการอ้างอิง เรียงตามปี เรียงตามชื่อ	อ้างโดย อ้างโดย	ปี
Score-CAM: Score-weighted visual explanations for convolutional neural networks H Wang, Z Wang, M Du, F Yang, Z Zhang, S Ding, P Mardziel, X Hu Proceedings of the IEEE/CVF conference on computer vision and pattern …, 2020	1288	2020
Universal and transferable adversarial attacks on aligned language models A Zou, Z Wang, N Carlini, M Nasr, JZ Kolter, M Fredrikson arXiv preprint arXiv:2307.15043, 2023	1168	2023
Representation engineering: A top-down approach to ai transparency A Zou, L Phan, S Chen, J Campbell, P Guo, R Ren, A Pan, X Yin, ... arXiv preprint arXiv:2310.01405, 2023	317	2023
Harmbench: A standardized evaluation framework for automated red teaming and robust refusal M Mazeika, L Phan, X Yin, A Zou, Z Wang, N Mu, E Sakhaee, N Li, ... arXiv preprint arXiv:2402.04249, 2024	208	2024
Globally-Robust Neural Networks K Leino, Z Wang, M Fredrikson Proceedings of ICML 2021, 2021	155	2021
The wmdp benchmark: Measuring and reducing malicious use with unlearning N Li, A Pan, A Gopal, S Yue, D Berrios, A Gatti, JD Li, AK Dombrowski, ... arXiv preprint arXiv:2403.03218, 2024	107	2024
Smoothed Geometry for Robust Attribution Z Wang, H Wang, S Ramkumar, M Fredrikson, P Mardziel, A Datta Proceedings of NeurIPS 2020, 2020	57	2020
Consistent counterfactuals for deep models E Black, Z Wang, M Fredrikson, A Datta ICLR 2022, 2021	56	2021
Towards frequency-based explanation for robust CNN Z Wang, Y Yang, A Shrivastava, V Rawal, Z Ding arXiv preprint arXiv:2005.03141, 2020	56	2020
Can LLMs Follow Simple Rules? N Mu, S Chen, Z Wang, S Chen, D Karamardian, L Aljeraisy, B Alomair, ... arXiv preprint arXiv:2311.04235, 2023	33	2023
Robust models are more interpretable because attributions look normal Z Wang, M Fredrikson, A Datta arXiv preprint arXiv:2103.11257, 2021	28	2021
Llm defenses are not robust to multi-turn human jailbreaks yet N Li, Z Han, I Steneker, W Primack, R Goodside, H Zhang, Z Wang, ... arXiv preprint arXiv:2408.15221, 2024	27	2024
Interpreting interpretations: Organizing attribution methods by criteria Z Wang, P Mardziel, A Datta, M Fredrikson Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2020	20	2020
Machine learning explainability and robustness: connected at the hip A Datta, M Fredrikson, K Leino, K Lu, S Sen, Z Wang Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data …, 2021	17	2021
Influence Patterns for Explaining Information Flow in BERT K Lu, Z Wang, P Mardziel, A Datta arXiv preprint arXiv:2011.00740, 2020	16	2020
A recipe for improved certifiable robustness: Capacity and data K Hu, K Leino, Z Wang, M Fredrikson ICLR 2024, 2023	9	2023
Unlocking deterministic robustness certification on imagenet K Hu, A Zou, Z Wang, K Leino, M Fredrikson Advances in Neural Information Processing Systems 36, 42993-43011, 2023	8	2023
Improving robust generalization by direct pac-bayesian bound minimization Z Wang, N Ding, T Levinboim, X Chen, R Soricut Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2023	8	2023
Learning modulo theories M Fredrikson, K Lu, S Vijayakumar, S Jha, V Ganesh, Z Wang arXiv preprint arXiv:2301.11435, 2023	7	2023
Transfer attacks and defenses for large language models on coding tasks C Zhang, Z Wang, R Mangal, M Fredrikson, L Jia, C Pasareanu arXiv preprint arXiv:2311.13445, 2023	5	2023

ระบบไม่สามารถดำเนินการได้ในขณะนี้ โปรดลองใหม่อีกครั้งในภายหลัง

บทความ 1–20

การอ้างอิงต่อปี

การอ้างอิงซ้ำกัน

การอ้างอิงที่รวมเข้าด้วยกัน

เพิ่มผู้เขียนร่วมผู้เขียนร่วม

ติดตาม

อ้างโดย