متابعة
Joar Skalse
Joar Skalse
DPhil Student in Computer Science, Oxford University
بريد إلكتروني تم التحقق منه على cs.ox.ac.uk
عنوان
عدد مرات الاقتباسات
عدد مرات الاقتباسات
السنة
Defining and characterizing reward gaming
J Skalse, N Howe, D Krasheninnikov, D Krueger
Advances in Neural Information Processing Systems 35, 9460-9471, 2022
2412022
Risks from learned optimization in advanced machine learning systems
E Hubinger, C van Merwijk, V Mikulik, J Skalse, S Garrabrant
arXiv preprint arXiv:1906.01820, 2019
1512019
Is SGD a Bayesian sampler? Well, almost
C Mingard, G Valle-Pérez, J Skalse, AA Louis
Journal of Machine Learning Research 22 (79), 1-64, 2021
572021
Invariance in policy optimisation and partial identifiability in reward learning
JMV Skalse, M Farrugia-Roberts, S Russell, A Abate, A Gleave
International Conference on Machine Learning, 32033-32058, 2023
512023
Towards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable AI Systems
D Dalrymple, J Skalse, Y Bengio, S Russell, M Tegmark, S Seshia, ...
arXiv preprint arXiv:2405.06624, 2024
372024
Neural networks are a priori biased towards boolean functions with low entropy
C Mingard, J Skalse, G Valle-Pérez, D Martínez-Rubio, V Mikulik, ...
arXiv preprint arXiv:1909.11522, 2019
332019
Misspecification in inverse reinforcement learning
J Skalse, A Abate
Proceedings of the AAAI Conference on Artificial Intelligence 37 (12), 15136 …, 2023
312023
Lexicographic multi-objective reinforcement learning
J Skalse, L Hammond, C Griffin, A Abate
arXiv preprint arXiv:2212.13769, 2022
272022
Reinforcement learning in Newcomblike environments
J Bell, L Linsefors, C Oesterheld, J Skalse
Advances in Neural Information Processing Systems 34, 22146-22157, 2021
172021
On the limitations of Markovian rewards to express multi-objective, risk-sensitive, and modal tasks
J Skalse, A Abate
Uncertainty in Artificial Intelligence, 1974-1984, 2023
132023
Goodhart's Law in Reinforcement Learning
J Karwowski, O Hayman, X Bai, K Kiendlhofer, C Griffin, J Skalse
arXiv preprint arXiv:2310.09144, 2023
122023
STARC: A General Framework For Quantifying Differences Between Reward Functions
J Skalse, L Farnik, SR Motwani, E Jenner, A Gleave, A Abate
arXiv preprint arXiv:2309.15257, 2023
72023
The reward hypothesis is false
JMV Skalse, A Abate
52022
Quantifying the Sensitivity of Inverse Reinforcement Learning to Misspecification
J Skalse, A Abate
arXiv preprint arXiv:2403.06854, 2024
42024
On The Expressivity of Objective-Specification Formalisms in Reinforcement Learning
R Subramani, M Williams, M Heitmann, H Holm, C Griffin, J Skalse
arXiv preprint arXiv:2310.11840, 2023
42023
A general framework for reward function distances
E Jenner, JMV Skalse, A Gleave
NeurIPS ML Safety Workshop, 2022
42022
All’s Well That Ends Well: Avoiding Side Effects with Distance-Impact Penalties
C Griffin, JMV Skalse, L Hammond, A Abate
NeurIPS ML Safety Workshop, 2022
22022
A General Counterexample to Any Decision Theory and Some Responses
J Skalse
arXiv preprint arXiv:2101.00280, 2021
22021
Safety Properties of Inductive Logic Programming.
G Leech, N Schoots, J Skalse
SafeAI@ AAAI, 2021
22021
Towards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable AI Systems
J Skalse, Y Bengio, S Russell, M Tegmark, S Seshia, S Omohundro, ...
arXiv e-prints, arXiv: 2405.06624, 2024
12024
يتعذر على النظام إجراء العملية في الوقت الحالي. عاود المحاولة لاحقًا.
مقالات 1–20