Wide & deep learning for recommender systems HT Cheng, L Koc, J Harmsen, T Shaked, T Chandra, H Aradhye, ... Proceedings of the 1st workshop on deep learning for recommender systems, 7-10, 2016 | 4413 | 2016 |
Gemini: a family of highly capable multimodal models G Team, R Anil, S Borgeaud, JB Alayrac, J Yu, R Soricut, J Schalkwyk, ... arXiv preprint arXiv:2312.11805, 2023 | 3253 | 2023 |
Palm 2 technical report R Anil, AM Dai, O Firat, M Johnson, D Lepikhin, A Passos, S Shakeri, ... arXiv preprint arXiv:2305.10403, 2023 | 1583 | 2023 |
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context G Team, P Georgiev, VI Lei, R Burnell, L Bai, A Gulati, G Tanzer, ... arXiv preprint arXiv:2403.05530, 2024 | 1159 | 2024 |
Gemma: Open models based on gemini research and technology G Team, T Mesnard, C Hardin, R Dadashi, S Bhupatiraju, S Pathak, ... arXiv preprint arXiv:2403.08295, 2024 | 1044 | 2024 |
Large scale distributed neural network training through online distillation R Anil, G Pereyra, A Passos, R Ormandi, GE Dahl, GE Hinton arXiv preprint arXiv:1804.03235, 2018 | 518 | 2018 |
Knowledge distillation: A good teacher is patient and consistent L Beyer, X Zhai, A Royer, L Markeeva, R Anil, A Kolesnikov Proceedings of the IEEE/CVF conference on computer vision and pattern …, 2022 | 341 | 2022 |
Efficiently identifying task groupings for multi-task learning C Fifty, E Amid, Z Zhao, T Yu, R Anil, C Finn Advances in Neural Information Processing Systems 34, 27503-27516, 2021 | 289 | 2021 |
Lingvo: a modular and scalable framework for sequence-to-sequence modeling J Shen, P Nguyen, Y Wu, Z Chen, MX Chen, Y Jia, A Kannan, T Sainath, ... arXiv preprint arXiv:1902.08295, 2019 | 212 | 2019 |
Tf-ranking: Scalable tensorflow library for learning-to-rank RK Pasumarthi, S Bruch, X Wang, C Li, M Bendersky, M Najork, J Pfeifer, ... Proceedings of the 25th ACM SIGKDD International Conference on Knowledge …, 2019 | 163 | 2019 |
Large-scale differentially private BERT R Anil, B Ghazi, V Gupta, R Kumar, P Manurangsi arXiv preprint arXiv:2108.01624, 2021 | 152 | 2021 |
Robust bi-tempered logistic loss based on bregman divergences E Amid, MKK Warmuth, R Anil, T Koren Advances in Neural Information Processing Systems 32, 2019 | 146 | 2019 |
Scalable second order optimization for deep learning R Anil, V Gupta, T Koren, K Regan, Y Singer arXiv preprint arXiv:2002.09018, 2020 | 101 | 2020 |
Memory efficient adaptive optimization R Anil, V Gupta, T Koren, Y Singer Advances in Neural Information Processing Systems 32, 2019 | 92 | 2019 |
Palm 2 technical report. arXiv 2023 R Anil, AM Dai, O Firat, M Johnson, D Lepikhin, A Passos, S Shakeri, ... arXiv preprint arXiv:2305.10403, 0 | 84 | |
Sunipa Dev R Anil, AM Dai, O Firat, M Johnson, D Lepikhin, A Passos, S Shakeri, ... Jacob Devlin, Mark Díaz, Nan Du, Ethan Dyer, Vladimir Feinberg, Fangxiaoyu …, 2023 | 77 | 2023 |
A large batch optimizer reality check: Traditional, generic optimizers suffice across batch sizes Z Nado, JM Gilmer, CJ Shallue, R Anil, GE Dahl arXiv preprint arXiv:2102.06356, 2021 | 42 | 2021 |
Wide and deep machine learning models T Shaked, R Anil, HB Aradhye, G Anderson, W Chai, ML Koc, J Harmsen, ... US Patent 10,762,422, 2020 | 42 | 2020 |
Disentangling adaptive gradient methods from learning rates N Agarwal, R Anil, E Hazan, T Koren, C Zhang arXiv preprint arXiv:2002.11803, 2020 | 42 | 2020 |
On the factory floor: ML engineering for industrial-scale ads recommendation models R Anil, S Gadanho, D Huang, N Jacob, Z Li, D Lin, T Phillips, C Pop, ... arXiv preprint arXiv:2209.05310, 2022 | 34 | 2022 |