Stebėti
Micah Carroll
Micah Carroll
PhD student, UC Berkeley
Patvirtintas el. paštas berkeley.edu - Pagrindinis puslapis
Pavadinimas
Cituota
Cituota
Metai
Open problems and fundamental limitations of reinforcement learning from human feedback
S Casper, X Davies, C Shi, TK Gilbert, J Scheurer, J Rando, R Freedman, ...
arXiv preprint arXiv:2307.15217, 2023
4782023
On the Utility of Learning About Humans for Human-AI Coordination
M Carroll, R Shah, MK Ho, T Griffiths, S Seshia, P Abbeel, A Dragan
Advances in Neural Information Processing Systems, 2019, 5174-5185, 2019
4712019
Harms from Increasingly Agentic Algorithmic Systems
A Chan, R Salganik, A Markelius, C Pang, N Rajkumar, D Krasheninnikov, ...
Proceedings of the 2023 ACM Conference on Fairness, Accountability, and …, 2023
121*2023
Estimating and Penalizing Induced Preference Shifts in Recommender Systems
M Carroll, A Dragan, S Russell, D Hadfield-Menell
International Conference on Machine Learning, 2022 (Spotlight), 2686-2708, 2022
77*2022
Characterizing Manipulation from AI Systems
M Carroll*, A Chan*, H Ashton, D Krueger
EEAMO 2023, 2023
632023
Engagement, user satisfaction, and the amplification of divisive content on social media
S Milli, M Carroll, Y Wang, S Pandey, S Zhao, AD Dragan
arXiv preprint arXiv:2305.16941, 2023
52*2023
Uni[MASK]: Unified inference in sequential decision problems
M Carroll, O Paradise, J Lin, R Georgescu, M Sun, D Bignell, S Milani, ...
NeurIPS 2022 (Oral), 2022
41*2022
Evaluating the Robustness of Collaborative Agents
P Knott, M Carroll, S Devlin, K Ciosek, K Hofmann, AD Dragan, R Shah
AAMAS 2021 (Extended Abstract), 2021
322021
Beyond preferences in ai alignment
T Zhi-Xuan, M Carroll, M Franklin, H Ashton
Philosophical Studies, 1-51, 2024
162024
Ai alignment with changing and influenceable reward functions
M Carroll, D Foote, A Siththaranjan, S Russell, A Dragan
arXiv preprint arXiv:2405.17713, 2024
162024
Optimal Behavior Prior: Data-Efficient Human Models for Improved Human-AI Collaboration
M Yang, M Carroll, A Dragan
NeurIPS 2022 Human in the Loop Learning (HiLL) Workshop, 2022
102022
Humanity's Last Exam
L Phan, A Gatti, Z Han, N Li, J Hu, H Zhang, S Shi, M Choi, A Agrawal, ...
arXiv preprint arXiv:2501.14249, 2025
82025
On Targeted Manipulation and Deception when Optimizing LLMs for User Feedback
M Williams*, M Carroll*, A Narang, C Weisser, B Murphy, A Dragan
arXiv preprint arXiv:2411.02306, 2024
7*2024
Who Needs to Know? Minimal Knowledge for Optimal Coordination
N Lauffer, A Shah, M Carroll, MD Dennis, S Russell
International Conference on Machine Learning 2023, 18599-18613, 2023
52023
Time-Efficient Reward Learning via Visually Assisted Cluster Ranking
D Zhang, M Carroll, A Bobu, A Dragan
NeurIPS 2022 Human in the Loop Learning (HiLL) Workshop, 2022
52022
Overview of current AI alignment approaches
M Carroll
32018
Sistema negali atlikti operacijos. Bandykite vėliau dar kartą.
Straipsniai 1–16