Exploring the limits of transfer learning with a unified text-to-text transformer C Raffel, N Shazeer, A Roberts, K Lee, S Narang, M Matena, Y Zhou, W Li, ... Journal of machine learning research 21 (140), 1-67, 2020 | 21173 | 2020 |
Llama 2: Open foundation and fine-tuned chat models H Touvron, L Martin, K Stone, P Albert, A Almahairi, Y Babaei, ... arXiv preprint arXiv:2307.09288, 2023 | 12190 | 2023 |
Palm: Scaling language modeling with pathways A Chowdhery, S Narang, J Devlin, M Bosma, G Mishra, A Roberts, ... Journal of Machine Learning Research 24 (240), 1-113, 2023 | 5604 | 2023 |
Deep speech 2: End-to-end speech recognition in english and mandarin D Amodei, S Ananthanarayanan, R Anubhai, J Bai, E Battenberg, C Case, ... International conference on machine learning, 173-182, 2016 | 3915 | 2016 |
Scaling instruction-finetuned language models HW Chung, L Hou, S Longpre, B Zoph, Y Tay, W Fedus, Y Li, X Wang, ... Journal of Machine Learning Research 25 (70), 1-53, 2024 | 3280 | 2024 |
The llama 3 herd of models A Dubey, A Jauhri, A Pandey, A Kadian, A Al-Dahle, A Letman, A Mathur, ... arXiv preprint arXiv:2407.21783, 2024 | 2275 | 2024 |
Mixed precision training P Micikevicius, S Narang, J Alben, G Diamos, E Elsen, D Garcia, ... arXiv preprint arXiv:1710.03740, 2017 | 2123 | 2017 |
Self-consistency improves chain of thought reasoning in language models X Wang, J Wei, D Schuurmans, Q Le, E Chi, S Narang, A Chowdhery, ... arXiv preprint arXiv:2203.11171, 2022 | 1270 | 2022 |
Deep voice 3: Scaling text-to-speech with convolutional sequence learning W Ping, K Peng, A Gibiansky, SO Arik, A Kannan, S Narang, J Raiman, ... arXiv preprint arXiv:1710.07654, 2017 | 919* | 2017 |
Deep learning scaling is predictable, empirically J Hestness, S Narang, N Ardalani, G Diamos, H Jun, H Kianinejad, ... arXiv preprint arXiv:1712.00409, 2017 | 829 | 2017 |
Byt5: Towards a token-free future with pre-trained byte-to-byte models L Xue, A Barua, N Constant, R Al-Rfou, S Narang, M Kale, A Roberts, ... Transactions of the Association for Computational Linguistics 10, 291-306, 2022 | 462 | 2022 |
Exploring sparsity in recurrent neural networks S Narang, E Elsen, G Diamos, S Sengupta arXiv preprint arXiv:1704.05119, 2017 | 381 | 2017 |
DSD: regularizing deep neural networks with dense-sparse-dense training flow S Han, J Pool, S Narang, H Mao, S Tang, E Elsen, B Catanzaro, J Tran, ... arXiv preprint arXiv:1607.04381 3 (6), 2016 | 352* | 2016 |
Wt5?! training text-to-text models to explain their predictions S Narang, C Raffel, K Lee, A Roberts, N Fiedel, K Malkan arXiv preprint arXiv:2004.14546, 2020 | 213 | 2020 |
Effective long-context scaling of foundation models W Xiong, J Liu, I Molybog, H Zhang, P Bhargava, R Hou, L Martin, ... arXiv preprint arXiv:2309.16039, 2023 | 172 | 2023 |
Scaling up models and data with t5x and seqio A Roberts, HW Chung, G Mishra, A Levskaya, J Bradbury, D Andor, ... Journal of Machine Learning Research 24 (377), 1-8, 2023 | 162 | 2023 |
Block-sparse recurrent neural networks S Narang, E Undersander, G Diamos arXiv preprint arXiv:1711.02782, 2017 | 161 | 2017 |
Llama 2: open foundation and fine-tuned chat models. arXiv H Touvron, L Martin, K Stone, P Albert, A Almahairi, Y Babaei, ... arXiv preprint arXiv:2307.09288, 2023 | 152 | 2023 |
Llama 2: Open foundation and fine-tuned chat models, 2023b H Touvron, L Martin, K Stone, P Albert, A Almahairi, Y Babaei, ... URL https://arxiv. org/abs/2307.09288, 2023 | 144 | 2023 |
Llama 2: Open foundation and fine-tuned chat models. arXiv 2023 H Touvron, L Martin, K Stone, P Albert, A Almahairi, Y Babaei, ... arXiv preprint arXiv:2307.09288, 0 | 142 | |