Gemini: a family of highly capable multimodal models G Team, R Anil, S Borgeaud, JB Alayrac, J Yu, R Soricut, J Schalkwyk, ... arXiv preprint arXiv:2312.11805, 2023 | 2510 | 2023 |
Palm 2 technical report R Anil, AM Dai, O Firat, M Johnson, D Lepikhin, A Passos, S Shakeri, ... arXiv preprint arXiv:2305.10403, 2023 | 1561 | 2023 |
Beyond the imitation game: Quantifying and extrapolating the capabilities of language models A Srivastava, A Rastogi, A Rao, AAM Shoeb, A Abid, A Fisch, AR Brown, ... arXiv preprint arXiv:2206.04615, 2022 | 1298 | 2022 |
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context G Team, P Georgiev, VI Lei, R Burnell, L Bai, A Gulati, G Tanzer, ... arXiv preprint arXiv:2403.05530, 2024 | 989 | 2024 |
Solving quantitative reasoning problems with language models A Lewkowycz, A Andreassen, D Dohan, E Dyer, H Michalewski, ... Advances in Neural Information Processing Systems 35, 3843-3857, 2022 | 675 | 2022 |
Explaining neural scaling laws Y Bahri, E Dyer, J Kaplan, J Lee, U Sharma Proceedings of the National Academy of Sciences 121 (27), e2311878121, 2024 | 246 | 2024 |
The large learning rate phase of deep learning: the catapult mechanism A Lewkowycz, Y Bahri, E Dyer, J Sohl-Dickstein, G Gur-Ari arXiv preprint arXiv:2003.02218, 2020 | 244 | 2020 |
Boundary terms, variational principles, and higher derivative modified gravity E Dyer, K Hinterbichler Physical Review D—Particles, Fields, Gravitation, and Cosmology 79 (2), 024028, 2009 | 239 | 2009 |
Gradient descent happens in a tiny subspace G Gur-Ari, DA Roberts, E Dyer arXiv preprint arXiv:1812.04754, 2018 | 221 | 2018 |
Exploring length generalization in large language models C Anil, Y Wu, A Andreassen, A Lewkowycz, V Misra, V Ramasesh, ... Advances in Neural Information Processing Systems 35, 38546-38556, 2022 | 200 | 2022 |
Anatomy of catastrophic forgetting: Hidden representations and task semantics VV Ramasesh, E Dyer, M Raghu arXiv preprint arXiv:2007.07400, 2020 | 194 | 2020 |
Effect of scale on catastrophic forgetting in neural networks VV Ramasesh, A Lewkowycz, E Dyer International Conference on Learning Representations, 2021 | 181 | 2021 |
When do curricula work? X Wu, E Dyer, B Neyshabur arXiv preprint arXiv:2012.03107, 2020 | 141 | 2020 |
Asymptotics of wide networks from feynman diagrams E Dyer, G Gur-Ari arXiv preprint arXiv:1909.11304, 2019 | 127 | 2019 |
Block-recurrent transformers DL Hutchins, I Schlag, Y Wu, E Dyer, B Neyshabur Advances in neural information processing systems 35, 33248-33261, 2022 | 120 | 2022 |
Universal bounds on charged states in 2d CFT and 3d gravity N Benjamin, E Dyer, AL Fitzpatrick, S Kachru Journal of High Energy Physics 2016 (8), 1-26, 2016 | 96 | 2016 |
Beyond human data: Scaling self-training for problem-solving with language models A Singh, JD Co-Reyes, R Agarwal, A Anand, P Patil, X Garcia, PJ Liu, ... arXiv preprint arXiv:2312.06585, 2023 | 87 | 2023 |
Scaling dimensions of monopole operators in the theory in 2 + 1 dimensions E Dyer, M Mezei, SS Pufu, S Sachdev Journal of High Energy Physics 2015 (6), 1-48, 2015 | 84 | 2015 |
2D CFT partition functions at late times E Dyer, G Gur-Ari Journal of High Energy Physics 2017 (8), 1-35, 2017 | 81 | 2017 |
WhichTF is functionally important in your open chromatin data? Y Tanigawa, ES Dyer, G Bejerano PLOS Computational Biology 18 (8), e1010378, 2022 | 77 | 2022 |