Ikuti
Rohin Shah
Rohin Shah
Research Scientist, Google DeepMind
Email yang diverifikasi di deepmind.com - Beranda
Judul
Dikutip oleh
Dikutip oleh
Tahun
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
G Team, P Georgiev, VI Lei, R Burnell, L Bai, A Gulati, G Tanzer, ...
arXiv preprint arXiv:2403.05530, 2024
10312024
On the utility of learning about humans for human-AI coordination
M Carroll, R Shah, MK Ho, T Griffiths, S Seshia, P Abbeel, A Dragan
Advances in Neural Information Processing Systems, 5174-5185, 2019
4632019
Chlorophyll: Synthesis-aided compiler for low-power spatial architectures
PM Phothilimthana, T Jelvis, R Shah, N Totla, S Chasins, R Bodik
ACM SIGPLAN Notices 49 (6), 396-407, 2014
952014
Optimal Policies Tend to Seek Power
AM Turner, L Smith, R Shah, A Critch, P Tadepalli
arXiv preprint arXiv:1912.01683, 2019
922019
Preferences Implicit in the State of the World
R Shah, D Krasheninnikov, J Alexander, P Abbeel, A Dragan
arXiv preprint arXiv:1902.04198, 2019
90*2019
Goal Misgeneralization: Why Correct Specifications Aren't Enough For Correct Goals
R Shah, V Varma, R Kumar, M Phuong, V Krakovna, J Uesato, Z Kenton
arXiv preprint arXiv:2210.01790, 2022
80*2022
On the Feasibility of Learning, Rather than Assuming, Human Biases for Reward Inference
R Shah, N Gundotra, P Abbeel, A Dragan
International Conference on Machine Learning, 5670-5679, 2019
802019
Does circuit analysis interpretability scale? evidence from multiple choice capabilities in chinchilla
T Lieberum, M Rahtz, J Kramár, N Nanda, G Irving, R Shah, V Mikulik
arXiv preprint arXiv:2307.09458, 2023
682023
The MAGICAL Benchmark for Robust Imitation
S Toyer, R Shah, A Critch, S Russell
Advances in Neural Information Processing Systems 33, 2020
532020
Gemma Scope: Open Sparse Autoencoders Everywhere All At Once on Gemma 2
T Lieberum, S Rajamanoharan, A Conmy, L Smith, N Sonnerat, V Varma, ...
arXiv preprint arXiv:2408.05147, 2024
522024
Evaluating Frontier Models for Dangerous Capabilities
M Phuong, M Aitchison, E Catt, S Cogan, A Kaskasoli, V Krakovna, ...
arXiv preprint arXiv:2403.13793, 2024
472024
Explaining grokking through circuit efficiency
V Varma, R Shah, Z Kenton, J Kramár, R Kumar
arXiv preprint arXiv:2309.02390, 2023
452023
Improving Dictionary Learning with Gated Sparse Autoencoders
S Rajamanoharan, A Conmy, L Smith, T Lieberum, V Varma, J Kramár, ...
arXiv preprint arXiv:2404.16014, 2024
442024
Benefits of Assistance over Reward Learning
R Shah, P Freire, N Alex, R Freedman, D Krasheninnikov, L Chan, ...
362020
Active Inverse Reward Design
S Mindermann, R Shah, A Gleave, D Hadfield-Menell
arXiv preprint arXiv:1809.03060, 2018
342018
Evaluating the Robustness of Collaborative Agents
P Knott, M Carroll, S Devlin, K Ciosek, K Hofmann, AD Dragan, R Shah
arXiv preprint arXiv:2101.05507, 2021
322021
An Empirical Investigation of Representation Learning for Imitation
X Chen, S Toyer, C Wild, S Emmons, I Fischer, KH Lee, N Alex, SH Wang, ...
Thirty-fifth Conference on Neural Information Processing Systems Datasets …, 2021
312021
The MineRL BASALT Competition on Learning from Human Feedback
R Shah, C Wild, SH Wang, N Alex, B Houghton, W Guss, S Mohanty, ...
arXiv preprint arXiv:2107.01969, 2021
262021
AtP*: An efficient and scalable method for localizing LLM behaviour to components
J Kramár, T Lieberum, R Shah, N Nanda
arXiv preprint arXiv:2403.00745, 2024
232024
Choice Set Misspecification in Reward Inference
R Freedman, R Shah, A Dragan
CEUR Workshop Proceedings, 2020
222020
Sistem tidak dapat melakukan operasi ini. Coba lagi nanti.
Artikel 1–20