Dylan Hadfield-Menell

Cited by

	All	Since 2020
Citations	4193	3685
h-index	31	30
i10-index	46	45

1600

800

400

1200

201620172018201920202021202220232024202519 78 168 198 334 413 443 824 1556 105

Public access

View all

16 articles

0 articles

available

not available

Based on funding mandates

Co-authors

Anca D DraganAssistant Professor at UC Berkeley // Director, AI Safety and Alignment, Google DeepMindVerified email at berkeley.edu
Stuart RussellProfessor of Computer Science, University of California, BerkeleyVerified email at cs.berkeley.edu
Pieter AbbeelUC Berkeley | CovariantVerified email at cs.berkeley.edu
Stephen CasperPhD student, MITVerified email at mit.edu
Gillian K. HadfieldJohns Hopkins University, Dept of Computer Science and School of Government and PolicyVerified email at jhu.edu
Andreas HauptStanford UniversityVerified email at stanford.edu
Smitha MilliMeta FAIRVerified email at meta.com
Thomas L. GriffithsProfessor of Psychology and Computer Science, Princeton UniversityVerified email at princeton.edu
Rohan ChitnisMeta AI, MIT, UC BerkeleyVerified email at fb.com
McKane AndrusUW HCDEVerified email at uw.edu
Jaime Fernández FisacAssistant Professor of Electrical and Computer Engineering, Princeton UniversityVerified email at princeton.edu
Sandy H HuangResearch Scientist, DeepMindVerified email at berkeley.edu
Joel Z LeiboResearch scientistVerified email at google.com
Simon ZhuangVerified email at berkeley.edu
Robert D. HawkinsStanford UniversityVerified email at stanford.edu
Mark HoAssistant Professor, New York UniversityVerified email at nyu.edu
Siddharth SrivastavaArizona State UniversityVerified email at asu.edu
Micah CarrollPhD student, UC BerkeleyVerified email at berkeley.edu
Gokul SwamyPhD Candidate, Carnegie Mellon UniversityVerified email at andrew.cmu.edu
Gabriel KreimanProfessor, Harvard Medical School and Children's HospitalVerified email at tch.harvard.edu

Dylan Hadfield-Menell

Massachusetts Institute of Technology

Verified email at csail.mit.edu - Homepage

Artificial Intelligence


Title Sort by citations Sort by year Sort by title	Cited by Cited by	Year
Cooperative Inverse Reinforcement Learning D Hadfield-Menell, SJ Russell, P Abbeel, A Dragan Advances in Neural Information Processing Systems 29, 2016	840	2016
Inverse Reward Design D Hadfield-Menell, S Milli, P Abbeel, SJ Russell, A Dragan Advances in Neural Information Processing Systems 30, 2017	507	2017
Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback S Casper, X Davies, C Shi, TK Gilbert, J Scheurer, J Rando, R Freedman, ... Transactions on Machine Learning Research, 2023	429	2023
Toward Transparent AI: A survey on interpreting the inner structures of deep neural networks T Räuker, A Ho, S Casper, D Hadfield-Menell 2023 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML), 464-483, 2023	189	2023
The off-switch game D Hadfield-Menell, A Dragan, P Abbeel, S Russell Proceedings of the Twenty-Sixth International Joint Conference on Artificial …, 2017	189	2017
On the geometry of adversarial examples M Khoury, D Hadfield-Menell arXiv preprint arXiv:1811.00525, 2018	113*	2018
Incomplete contracting and AI alignment D Hadfield-Menell, GK Hadfield Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society, 417-422, 2019	108	2019
Pragmatic-pedagogic value alignment JF Fisac, MA Gates, JB Hamrick, C Liu, D Hadfield-Menell, ... Robotics research: the 18th international symposium Isrr, 49-57, 2020	106	2020
Guided search for task and motion plans using learned heuristics R Chitnis, D Hadfield-Menell, A Gupta, S Srivastava, E Groshev, C Lin, ... 2016 IEEE International Conference on Robotics and Automation (ICRA), 447-454, 2016	95	2016
Consequences of Misaligned AI S Zhuang, D Hadfield-Menell Advances in Neural Information Processing Systems 33, 15763-15773, 2020	93	2020
Explore, establish, exploit: Red teaming language models from scratch S Casper, J Lin, J Kwon, G Culp, D Hadfield-Menell arXiv preprint arXiv:2306.09442, 2023	84	2023
What are you optimizing for? aligning recommender systems with human values J Stray, I Vendrov, J Nixon, S Adler, D Hadfield-Menell arXiv preprint arXiv:2107.10939, 2021	83	2021
Should robots be obedient? S Milli, D Hadfield-Menell, A Dragan, S Russell Proceedings of the 26th International Joint Conference on Artificial …, 2017	81	2017
Building Human Values into Recommender Systems: An Interdisciplinary Synthesis and Open Problems J Stray, A Halevy, P Assar, D Hadfield-menell, C Boutilier, A Ashar, ... ACM Transactions on Recommender Systems, 2023	70*	2023
On the utility of model learning in hri R Choudhury, G Swamy, D Hadfield-Menell, AD Dragan 2019 14th ACM/IEEE International Conference on Human-Robot Interaction (HRI …, 2019	68	2019
Conservative Agency via Attainable Utility Preservation AM Turner, D Hadfield-Menell, P Tadepalli Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society, 385-391, 2020	67	2020
Expressive robot motion timing A Zhou, D Hadfield-Menell, A Nagabandi, AD Dragan Proceedings of the 2017 ACM/IEEE international conference on human-robot …, 2017	67	2017
Spurious normativity enhances learning of compliance and enforcement behavior in artificial agents R Köster, D Hadfield-Menell, R Everett, L Weidinger, GK Hadfield, ... Proceedings of the National Academy of Sciences 119 (3), e2106028118, 2022	60*	2022
Modular task and motion planning in belief space D Hadfield-Menell, E Groshev, R Chitnis, P Abbeel 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems …, 2015	60	2015
Black-Box Access is Insufficient for Rigorous AI Audits S Casper, C Ezell, C Siegmann, N Kolt, TL Curtis, B Bucknall, A Haupt, ... Proceedings of the 2024 ACM Conference on Fairness, Accountability, and …, 2024	59	2024

The system can't perform the operation now. Try again later.

Articles 1–20

Citations per year

Duplicate citations

Merged citations

Add co-authorsCo-authors

Follow

Cited by

Co-authors