Rohin Shah

Dikutip oleh

	Semua	Sejak 2020
Kutipan	2539	2475
indeks-h	20	20
indeks-i10	25	25

1500

750

375

1125

201620172018201920202021202220232024202516 16 10 14 61 119 190 372 1496 232

Akses publik

Lihat semua

4 artikel

0 artikel

tersedia

tidak tersedia

Berdasarkan pada mandat pendanaan

Ikuti

Rohin Shah

Research Scientist, Google DeepMind

Email yang diverifikasi di deepmind.com - Beranda

AI alignment


Judul Urutkan menurut kutipan Urutkan menurut tahun Urutkan menurut judul	Dikutip oleh Dikutip oleh	Tahun
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context G Team, P Georgiev, VI Lei, R Burnell, L Bai, A Gulati, G Tanzer, ... arXiv preprint arXiv:2403.05530, 2024	1031	2024
On the utility of learning about humans for human-AI coordination M Carroll, R Shah, MK Ho, T Griffiths, S Seshia, P Abbeel, A Dragan Advances in Neural Information Processing Systems, 5174-5185, 2019	463	2019
Chlorophyll: Synthesis-aided compiler for low-power spatial architectures PM Phothilimthana, T Jelvis, R Shah, N Totla, S Chasins, R Bodik ACM SIGPLAN Notices 49 (6), 396-407, 2014	95	2014
Optimal Policies Tend to Seek Power AM Turner, L Smith, R Shah, A Critch, P Tadepalli arXiv preprint arXiv:1912.01683, 2019	92	2019
Preferences Implicit in the State of the World R Shah, D Krasheninnikov, J Alexander, P Abbeel, A Dragan arXiv preprint arXiv:1902.04198, 2019	90*	2019
Goal Misgeneralization: Why Correct Specifications Aren't Enough For Correct Goals R Shah, V Varma, R Kumar, M Phuong, V Krakovna, J Uesato, Z Kenton arXiv preprint arXiv:2210.01790, 2022	80*	2022
On the Feasibility of Learning, Rather than Assuming, Human Biases for Reward Inference R Shah, N Gundotra, P Abbeel, A Dragan International Conference on Machine Learning, 5670-5679, 2019	80	2019
Does circuit analysis interpretability scale? evidence from multiple choice capabilities in chinchilla T Lieberum, M Rahtz, J Kramár, N Nanda, G Irving, R Shah, V Mikulik arXiv preprint arXiv:2307.09458, 2023	68	2023
The MAGICAL Benchmark for Robust Imitation S Toyer, R Shah, A Critch, S Russell Advances in Neural Information Processing Systems 33, 2020	53	2020
Gemma Scope: Open Sparse Autoencoders Everywhere All At Once on Gemma 2 T Lieberum, S Rajamanoharan, A Conmy, L Smith, N Sonnerat, V Varma, ... arXiv preprint arXiv:2408.05147, 2024	52	2024
Evaluating Frontier Models for Dangerous Capabilities M Phuong, M Aitchison, E Catt, S Cogan, A Kaskasoli, V Krakovna, ... arXiv preprint arXiv:2403.13793, 2024	47	2024
Explaining grokking through circuit efficiency V Varma, R Shah, Z Kenton, J Kramár, R Kumar arXiv preprint arXiv:2309.02390, 2023	45	2023
Improving Dictionary Learning with Gated Sparse Autoencoders S Rajamanoharan, A Conmy, L Smith, T Lieberum, V Varma, J Kramár, ... arXiv preprint arXiv:2404.16014, 2024	44	2024
Benefits of Assistance over Reward Learning R Shah, P Freire, N Alex, R Freedman, D Krasheninnikov, L Chan, ...	36	2020
Active Inverse Reward Design S Mindermann, R Shah, A Gleave, D Hadfield-Menell arXiv preprint arXiv:1809.03060, 2018	34	2018
Evaluating the Robustness of Collaborative Agents P Knott, M Carroll, S Devlin, K Ciosek, K Hofmann, AD Dragan, R Shah arXiv preprint arXiv:2101.05507, 2021	32	2021
An Empirical Investigation of Representation Learning for Imitation X Chen, S Toyer, C Wild, S Emmons, I Fischer, KH Lee, N Alex, SH Wang, ... Thirty-fifth Conference on Neural Information Processing Systems Datasets …, 2021	31	2021
The MineRL BASALT Competition on Learning from Human Feedback R Shah, C Wild, SH Wang, N Alex, B Houghton, W Guss, S Mohanty, ... arXiv preprint arXiv:2107.01969, 2021	26	2021
AtP*: An efficient and scalable method for localizing LLM behaviour to components J Kramár, T Lieberum, R Shah, N Nanda arXiv preprint arXiv:2403.00745, 2024	23	2024
Choice Set Misspecification in Reward Inference R Freedman, R Shah, A Dragan CEUR Workshop Proceedings, 2020	22	2020

Sistem tidak dapat melakukan operasi ini. Coba lagi nanti.

Artikel 1–20

Kutipan per tahun

Kutipan duplikat

Kutipan yang digabung

Tambahkan pengarang bersamaPengarang bersama

Ikuti

Dikutip oleh