עקוב אחר
Tom Everitt
Tom Everitt
Staff Research Scientist at Google DeepMind
כתובת אימייל מאומתת בדומיין google.com - דף הבית
כותרת
צוטט על ידי
צוטט על ידי
שנה
Scalable agent alignment via reward modeling: a research direction
J Leike, D Krueger, T Everitt, M Martic, V Maini, S Legg
arXiv preprint arXiv:1811.07871, 2018
3792018
AI safety gridworlds
J Leike, M Martic, V Krakovna, PA Ortega, T Everitt, A Lefrancq, L Orseau, ...
arXiv preprint arXiv:1711.09883, 2017
3622017
Alignment of language agents
Z Kenton, T Everitt, L Weidinger, I Gabriel, V Mikulik, G Irving
arXiv preprint arXiv:2103.14659, 2021
1672021
AGI safety literature review
T Everitt, G Lea, M Hutter
International Joint Conference on AI (IJCAI), 2018
1602018
Count-based exploration in feature space for reinforcement learning
J Martin, SN Sasikumar, T Everitt, M Hutter
International Joint Conference on AI (IJCAI), 2017
1512017
Reinforcement Learning with Corrupted Reward Channel
T Everitt, V Krakovna, L Orseau, M Hutter, S Legg
26th International Joint Conference on Artificial Intelligence (IJCAI), 2017
1312017
Specification gaming: the flip side of AI ingenuity
V Krakovna, J Uesato, V Mikulik, M Rahtz, T Everitt, R Kumar, Z Kenton, ...
DeepMind Blog 3, 2020
1232020
Reward tampering problems and solutions in reinforcement learning: A causal influence diagram perspective
T Everitt, M Hutter, R Kumar, V Krakovna
Synthese, 2021
1042021
Shaking the foundations: delusions in sequence models for interaction and control
PA Ortega, M Kunesch, G Delétang, T Genewein, J Grau-Moya, J Veness, ...
arXiv preprint arXiv:2110.10819, 2021
652021
Agent incentives: A causal perspective
T Everitt, R Carey, ED Langlois, PA Ortega, S Legg
Proceedings of the AAAI Conference on Artificial Intelligence 35 (13), 11487 …, 2021
592021
Avoiding wireheading with value reinforcement learning
T Everitt, M Hutter
International Conference on Artificial General Intelligence (AGI), 12-22, 2016
532016
Towards safe artificial general intelligence
T Everitt
PQDT-Global, 2019
412019
Robust agents learn causal world models
J Richens, T Everitt
The Twelfth International Conference on Learning Representations, 2024
392024
Discovering agents
Z Kenton, R Kumar, S Farquhar, J Richens, M MacDermott, T Everitt
Artificial Intelligence 322, 103963, 2023
382023
Universal artificial intelligence: Practical agents and fundamental challenges
T Everitt, M Hutter
Foundations of trusted autonomy, 15-46, 2018
362018
Understanding agent incentives using causal influence diagrams. Part I: Single action settings
T Everitt, PA Ortega, E Barnes, S Legg
arXiv preprint arXiv:1902.09980, 2019
332019
Self-modification of policy and utility function in rational agents
T Everitt, D Filan, M Daswani, M Hutter
International Conference on Artificial General Intelligence (AGI), 1-11, 2016
332016
Honesty is the best policy: defining and mitigating AI deception
F Ward, F Toni, F Belardinelli, T Everitt
Advances in neural information processing systems 36, 2313-2341, 2023
302023
Path-specific objectives for safer agent incentives
S Farquhar, R Carey, T Everitt
Proceedings of the AAAI Conference on Artificial Intelligence 36 (9), 9529-9538, 2022
292022
A game-theoretic analysis of the off-switch game
T Wängberg, M Böörs, E Catt, T Everitt, M Hutter
Artificial General Intelligence: 10th International Conference, AGI 2017 …, 2017
272017
המערכת אינה יכולה לבצע את הפעולה כעת. נסה שוב מאוחר יותר.
מאמרים 1–20