Прати
Jan Leike
Jan Leike
OpenAI
Верификована је имејл адреса на openai.com - Почетна страница
Наслов
Навело
Навело
Година
Training language models to follow instructions with human feedback
L Ouyang, J Wu, X Jiang, D Almeida, C Wainwright, P Mishkin, C Zhang, ...
Advances in Neural Information Processing Systems 35, 27730-27744, 2022
125072022
GPT-4 technical report
OpenAI
arXiv, 2023
10731*2023
Evaluating large language models trained on code
M Chen, J Tworek, H Jun, Q Yuan, HPO Pinto, J Kaplan, H Edwards, ...
arXiv preprint arXiv:2107.03374, 2021
40162021
Deep reinforcement learning from human preferences
PF Christiano, J Leike, T Brown, M Martic, S Legg, D Amodei
Advances in Neural Information Processing Systems 30, 4299-4307, 2017
35852017
Let's Verify Step by Step
H Lightman, V Kosaraju, Y Burda, H Edwards, B Baker, T Lee, J Leike, ...
arXiv preprint arXiv:2305.20050, 2023
6502023
Reward learning from human preferences and demonstrations in Atari
B Ibarz, J Leike, T Pohlen, G Irving, S Legg, D Amodei
Advances in Neural Information Processing Systems, 8011-8023, 2018
4642018
Scalable agent alignment via reward modeling: a research direction
J Leike, D Krueger, T Everitt, M Martic, V Maini, S Legg
arXiv preprint arXiv:1811.07871, 2018
3792018
AI Safety Gridworlds
J Leike, M Martic, V Krakovna, PA Ortega, T Everitt, A Lefrancq, L Orseau, ...
arXiv preprint arXiv:1711.09883, 2017
3622017
Recursively summarizing books with human feedback
J Wu, L Ouyang, DM Ziegler, N Stiennon, R Lowe, J Leike, P Christiano
arXiv preprint arXiv:2109.10862, 2021
2822021
Language models can explain neurons in language models
S Bills, N Cammarata, D Mossing, H Tillman, L Gao, G Goh, I Sutskever, ...
URL https://openaipublic. blob. core. windows. net/neuron-explainer/paper …, 2023
2482023
Self-critiquing models for assisting human evaluators
W Saunders, C Yeh, J Wu, S Bills, L Ouyang, J Ward, J Leike
arXiv preprint arXiv:2206.05802, 2022
2402022
GPT-4o System Card
A Hurst, A Lerer, AP Goucher, A Perelman, A Ramesh, A Clark, AJ Ostrow, ...
arXiv preprint arXiv:2410.21276, 2024
2252024
Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision
C Burns, P Izmailov, JH Kirchner, B Baker, L Gao, L Aschenbrenner, ...
arXiv preprint arXiv:2312.09390, 2023
2242023
Learning to Understand Goal Specifications by Modelling Reward
D Bahdanau, F Hill, J Leike, E Hughes, P Kohli, E Grefenstette
arXiv preprint arXiv:1806.01946, 2018
209*2018
Ranking Templates for Linear Loops
J Leike, M Heizmann
Logical Methods in Computer Science, 2015
1002015
Scaling and evaluating sparse autoencoders
L Gao, TD la Tour, H Tillman, G Goh, R Troll, A Radford, I Sutskever, ...
arXiv preprint arXiv:2406.04093, 2024
992024
Learning human objectives by evaluating hypothetical behavior
S Reddy, A Dragan, S Levine, S Legg, J Leike
International Conference on Machine Learning, 8020-8029, 2020
912020
Quantifying Differences in Reward Functions
A Gleave, M Dennis, S Legg, S Russell, J Leike
arXiv preprint arXiv:2006.13900, 2020
742020
Institutionalizing ethics in AI through broader impact requirements
CEA Prunkl, C Ashurst, M Anderljung, H Webb, J Leike, A Dafoe
Nature Machine Intelligence 3 (2), 104-110, 2021
722021
Linear ranking for linear lasso programs
M Heizmann, J Hoenicke, J Leike, A Podelski
Automated Technology for Verification and Analysis, 365-380, 2013
722013
Систем тренутно не може да изврши ову радњу. Пробајте поново касније.
Чланци 1–20