Volgen
Janos Kramar
Janos Kramar
DeepMind
Geverifieerd e-mailadres voor google.com
Titel
Geciteerd door
Geciteerd door
Jaar
Zoneout: Regularizing rnns by randomly preserving hidden activations
D Krueger, T Maharaj, J Kramár, M Pezeshki, N Ballas, NR Ke, A Goyal, ...
arXiv preprint arXiv:1606.01305, 2016
3932016
Reinforcement and imitation learning for diverse visuomotor skills
Y Zhu, Z Wang, J Merel, A Rusu, T Erez, S Cabi, S Tunyasuvunakool, ...
arXiv preprint arXiv:1802.09564, 2018
3822018
OpenSpiel: A framework for reinforcement learning in games
M Lanctot, E Lockhart, JB Lespiau, V Zambaldi, S Upadhyay, J Pérolat, ...
arXiv preprint arXiv:1908.09453, 2019
3012019
Guidelines for artificial intelligence containment
J Babcock, J Kramar, RV Yampolskiy
Next-Generation Ethics: Engineering a Better Society (Ed.) Ali. E. Abbas, 90-112, 2019
682019
Tracr: Compiled transformers as a laboratory for interpretability
D Lindner, J Kramár, S Farquhar, M Rahtz, T McGrath, V Mikulik
Advances in Neural Information Processing Systems 36, 2024
662024
The AGI containment problem
J Babcock, J Kramár, R Yampolskiy
Artificial General Intelligence: 9th International Conference, AGI 2016, New …, 2016
652016
Does circuit analysis interpretability scale? evidence from multiple choice capabilities in chinchilla
T Lieberum, M Rahtz, J Kramár, N Nanda, G Irving, R Shah, V Mikulik
arXiv preprint arXiv:2307.09458, 2023
632023
Learning to play no-press diplomacy with best response policy iteration
T Anthony, T Eccles, A Tacchetti, J Kramár, I Gemp, T Hudson, N Porcel, ...
Advances in Neural Information Processing Systems 33, 17987-18003, 2020
582020
Learning reciprocity in complex sequential social dilemmas
T Eccles, E Hughes, J Kramár, S Wheelwright, JZ Leibo
arXiv preprint arXiv:1903.08082, 2019
522019
Negotiation and honesty in artificial intelligence methods for the board game of Diplomacy
J Kramár, T Eccles, I Gemp, A Tacchetti, KR McKee, M Malinowski, ...
Nature Communications 13 (1), 7214, 2022
512022
Gemma scope: Open sparse autoencoders everywhere all at once on gemma 2
T Lieberum, S Rajamanoharan, A Conmy, L Smith, N Sonnerat, V Varma, ...
arXiv preprint arXiv:2408.05147, 2024
492024
Explaining grokking through circuit efficiency
V Varma, R Shah, Z Kenton, J Kramár, R Kumar
arXiv preprint arXiv:2309.02390, 2023
442023
The hydra effect: Emergent self-repair in language model computations
T McGrath, M Rahtz, J Kramar, V Mikulik, S Legg
arXiv preprint arXiv:2307.15771, 2023
402023
Improving dictionary learning with gated sparse autoencoders
S Rajamanoharan, A Conmy, L Smith, T Lieberum, V Varma, J Kramár, ...
arXiv preprint arXiv:2404.16014, 2024
392024
Reinforcement and imitation learning for a task
S Tunyasuvunakool, Y Zhu, J Merel, J Kramar, Z Wang, NMO Heess
US Patent App. 16/174,112, 2019
352019
Jumping ahead: Improving reconstruction fidelity with jumprelu sparse autoencoders
S Rajamanoharan, T Lieberum, N Sonnerat, A Conmy, V Varma, J Kramár, ...
arXiv preprint arXiv:2407.14435, 2024
292024
OpenSpiel: a framework for reinforcement learning in games. CoRR abs/1908.09453 (2019)
M Lanctot, E Lockhart, JB Lespiau, V Zambaldi, S Upadhyay, J Pérolat, ...
arXiv preprint arXiv:1908.09453, 2019
292019
AtP*: An efficient and scalable method for localizing LLM behaviour to components
J Kramár, T Lieberum, R Shah, N Nanda
arXiv preprint arXiv:2403.00745, 2024
212024
Sample-based approximation of Nash in large many-player games via gradient descent
I Gemp, R Savani, M Lanctot, Y Bachrach, T Anthony, R Everett, ...
arXiv preprint arXiv:2106.01285, 2021
202021
On scalable oversight with weak llms judging strong llms
Z Kenton, NY Siegel, J Kramár, J Brown-Cohen, S Albanie, J Bulian, ...
arXiv preprint arXiv:2407.04622, 2024
132024
Het systeem kan de bewerking nu niet uitvoeren. Probeer het later opnieuw.
Artikelen 1–20