Suivre
Dmitrii Krasheninnikov
Dmitrii Krasheninnikov
Adresse e-mail validée de cam.ac.uk - Page d'accueil
Titre
Citée par
Citée par
Année
Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
S Casper*, X Davies*, C Shi, TK Gilbert, J Scheurer, J Rando, ...
TMLR, 2023
4592023
Defining and Characterizing Reward Hacking
J Skalse*, NHR Howe, D Krasheninnikov, D Krueger*
Advances in Neural Information Processing Systems 36, 2022
2432022
Harms from Increasingly Agentic Algorithmic Systems
A Chan, R Salganik, A Markelius, C Pang, N Rajkumar, D Krasheninnikov, ...
Proceedings of the 2023 ACM Conference on Fairness, Accountability, and …, 2023
112*2023
Preferences Implicit in the State of the World
R Shah*, D Krasheninnikov*, J Alexander, P Abbeel, A Dragan
International Conference on Learning Representations, 2019
90*2019
Benefits of Assistance over Reward Learning
R Shah, P Freire, N Alex, R Freedman, D Krasheninnikov, L Chan, ...
NeurIPS Workshop on Cooperative AI, best paper award, 2020
362020
Implicit meta-learning may lead language models to trust more reliable sources
D Krasheninnikov*, E Krasheninnikov*, B Mlodozeniec, T Maharaj, ...
ICML 2024, arXiv:2310.15047, 2023
13*2023
Stress-Testing Capability Elicitation With Password-Locked Models
R Greenblatt*, F Roger*, D Krasheninnikov, D Krueger
Advances in Neural Information Processing Systems 38, 2024
102024
Assistance with large language models
D Krasheninnikov*, E Krasheninnikov*, D Krueger
NeurIPS ML Safety Workshop, 2022
102022
Combining reward information from multiple sources
D Krasheninnikov, R Shah, H van Hoof
NeurIPS Workshop on Learning with Rich Experience, 2019
52019
Comparing Bottom-Up and Top-Down Steering Approaches on In-Context Learning Tasks
M Brumley, J Kwon, D Krueger, D Krasheninnikov, U Anwar
arXiv preprint arXiv:2411.07213, 2024
12024
Steering Clear: A Systematic Study of Activation Steering in a Toy Setup
D Krasheninnikov, D Krueger
NeurIPS Workshop on Foundation Model Interventions (MINT), 2024
2024
Le système ne peut pas réaliser cette opération maintenant. Veuillez réessayer plus tard.
Articles 1–11