Gemini: a family of highly capable multimodal models G Team, R Anil, S Borgeaud, JB Alayrac, J Yu, R Soricut, J Schalkwyk, ... arXiv preprint arXiv:2312.11805, 2023 | 3227 | 2023 |
Lamda: Language models for dialog applications R Thoppilan, D De Freitas, J Hall, N Shazeer, A Kulshreshtha, HT Cheng, ... arXiv preprint arXiv:2201.08239, 2022 | 1670 | 2022 |
Palm 2 technical report R Anil, AM Dai, O Firat, M Johnson, D Lepikhin, A Passos, S Shakeri, ... arXiv preprint arXiv:2305.10403, 2023 | 1576 | 2023 |
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context G Team, P Georgiev, VI Lei, R Burnell, L Bai, A Gulati, G Tanzer, ... arXiv preprint arXiv:2403.05530, 2024 | 1180 | 2024 |
Gshard: Scaling giant models with conditional computation and automatic sharding D Lepikhin, HJ Lee, Y Xu, D Chen, O Firat, Y Huang, M Krikun, N Shazeer, ... arXiv preprint arXiv:2006.16668, 2020 | 1109 | 2020 |
Glam: Efficient scaling of language models with mixture-of-experts N Du, Y Huang, AM Dai, S Tong, D Lepikhin, Y Xu, M Krikun, Y Zhou, ... International conference on machine learning, 5547-5569, 2022 | 638 | 2022 |
Massively multilingual neural machine translation in the wild: Findings and challenges N Arivazhagan, A Bapna, O Firat, D Lepikhin, M Johnson, M Krikun, ... arXiv preprint arXiv:1907.05019, 2019 | 428 | 2019 |
Lingvo: a modular and scalable framework for sequence-to-sequence modeling J Shen, P Nguyen, Y Wu, Z Chen, MX Chen, Y Jia, A Kannan, T Sainath, ... arXiv preprint arXiv:1902.08295, 2019 | 212 | 2019 |
Toju Duke, Lucas Dixon, Kun Zhang, Quoc Le, Yonghui Wu, Zhifeng Chen, and Claire Cui. GLaM: Efficient scaling of language models with mixture-of-experts N Du, Y Huang, AM Dai, S Tong, D Lepikhin, Y Xu, M Krikun, Y Zhou, ... Proceedings of the 39th International Conference on Machine Learning 162 …, 2022 | 185 | 2022 |
GSPMD: general and scalable parallelization for ML computation graphs Y Xu, HJ Lee, D Chen, B Hechtman, Y Huang, R Joshi, M Krikun, ... arXiv preprint arXiv:2105.04663, 2021 | 130 | 2021 |
Renelito Delos Santos R Thoppilan, D De Freitas, J Hall, N Shazeer, A Kulshreshtha, HT Cheng, ... | 108 | 2022 |
Beyond distillation: Task-level mixture-of-experts for efficient inference S Kudugunta, Y Huang, A Bapna, M Krikun, D Lepikhin, MT Luong, O Firat arXiv preprint arXiv:2110.03742, 2021 | 107 | 2021 |
Palm 2 technical report. arXiv 2023 R Anil, AM Dai, O Firat, M Johnson, D Lepikhin, A Passos, S Shakeri, ... arXiv preprint arXiv:2305.10403, 0 | 84 | |
Sunipa Dev R Anil, AM Dai, O Firat, M Johnson, D Lepikhin, A Passos, S Shakeri, ... Jacob Devlin, Mark Díaz, Nan Du, Ethan Dyer, Vladimir Feinberg, Fangxiaoyu …, 2023 | 77 | 2023 |
Up next: retrieval methods for large scale related video suggestion M Bendersky, L Garcia-Pueyo, J Harmsen, V Josifovski, D Lepikhin Proceedings of the 20th ACM SIGKDD international conference on Knowledge …, 2014 | 46 | 2014 |
A very large diversity space of synthetically accessible compounds for use with drug design programs S Nikitin, N Zaitseva, O Demina, V Solovieva, E Mazin, S Mikhalev, ... Journal of computer-aided molecular design 19, 47-63, 2005 | 43 | 2005 |
Sibyl: A system for large scale supervised machine learning K Canini, T Chandra, E Ie, J McFadden, K Goldman, M Gunter, J Harmsen, ... Technical Talk 1 (113), 2.3, 2012 | 39 | 2012 |
Lamda: Language models for dialog applications AD Cohen, A Roberts, A Molina, A Butryna, A Jin, A Kulshreshtha, ... arXiv preprint arXiv:2201.08239, 2022 | 35 | 2022 |
Massively multilingual neural machine translation in the wild: Findings and challenges A Naveen, B Ankur, F Orhan, L Dmitry, J Melvin, K Maxim, CM Xu, C Yuan, ... arXiv preprint arXiv: 1907.05019, 2019 | 22 | 2019 |
Palm 2 technical report A Rohan, M Dai Andrew, F Orhan, J Melvin, L Dmitry, P Alexandre, ... arXiv preprint arXiv: 2305.10403, 1-93, 2023 | 8 | 2023 |