Language models are few-shot learners T Brown, B Mann, N Ryder, M Subbiah, JD Kaplan, P Dhariwal, ... Advances in neural information processing systems 33, 1877-1901, 2020 | 49170* | 2020 |
Language models are unsupervised multitask learners A Radford, J Wu, R Child, D Luan, D Amodei, I Sutskever OpenAI blog 1 (8), 9, 2019 | 26724* | 2019 |
Palm: Scaling language modeling with pathways A Chowdhery, S Narang, J Devlin, M Bosma, G Mishra, A Roberts, ... Journal of Machine Learning Research 24 (240), 1-113, 2023 | 5622 | 2023 |
Scaling laws for neural language models J Kaplan, S McCandlish, T Henighan, TB Brown, B Chess, R Child, ... arXiv preprint arXiv:2001.08361, 2020 | 2970 | 2020 |
Generating long sequences with sparse transformers R Child, S Gray, A Radford, I Sutskever arXiv preprint arXiv:1904.10509, 2019 | 2160 | 2019 |
Generative pretraining from pixels M Chen, A Radford, R Child, J Wu, H Jun, D Luan, I Sutskever International conference on machine learning, 1691-1703, 2020 | 1888 | 2020 |
Convolutional recurrent neural networks for small-footprint keyword spotting SO Arik, M Kliegl, R Child, J Hestness, A Gibiansky, C Fougner, ... arXiv preprint arXiv:1703.05390, 2017 | 839 | 2017 |
Using deepspeed and megatron to train megatron-turing nlg 530b, a large-scale generative language model S Smith, M Patwary, B Norick, P LeGresley, S Rajbhandari, J Casper, ... arXiv preprint arXiv:2201.11990, 2022 | 656 | 2022 |
Very deep vaes generalize autoregressive models and can outperform them on images R Child International Conference on Learning Representations (ICLR) 2021, Spotlight, 2020 | 362 | 2020 |
Exploring neural transducers for end-to-end speech recognition E Battenberg, J Chen, R Child, A Coates, YGY Li, H Liu, S Satheesh, ... 2017 IEEE automatic speech recognition and understanding workshop (ASRU …, 2017 | 290* | 2017 |
Language models are few-shot learners B Mann, N Ryder, M Subbiah, J Kaplan, P Dhariwal, A Neelakantan, ... arXiv preprint arXiv:2005.14165 1, 3, 2020 | 280 | 2020 |
Language models are few-shot learners (arXiv: 2005.14165). arXiv TB Brown, B Mann, N Ryder, M Subbiah, J Kaplan, P Dhariwal, ... | 180 | 2005 |
Scaling laws for neural language models. arXiv 2020 J Kaplan, S McCandlish, T Henighan, TB Brown, B Chess, R Child, ... arXiv preprint arXiv:2001.08361, 2001 | 166 | 2001 |
Palm: Scaling language modeling with pathways, 2022 A Chowdhery, S Narang, J Devlin, M Bosma, G Mishra, A Roberts, ... arXiv preprint arXiv:2204.02311, 2022 | 133 | 2022 |
Palm: Scaling language modeling with pathways. arXiv 2022 A Chowdhery, S Narang, J Devlin, M Bosma, G Mishra, A Roberts, ... arXiv preprint arXiv:2204.02311 10, 1, 2022 | 123 | 2022 |
Language models are few-shot learners. CoRR abs/2005.14165 (2020) TB Brown, B Mann, N Ryder, M Subbiah, J Kaplan, P Dhariwal, ... URL: https://arxiv. org/abs/2005.14165, 2005 | 119 | 2005 |
DALL· E: Creating images from text A Ramesh, M Pavlov, G Goh, S Gray, M Chen, R Child, V Misra, P Mishkin, ... OpenAI blog 2, 2021 | 110 | 2021 |
Active learning for speech recognition: the power of gradients J Huang, R Child, V Rao, H Liu, S Satheesh, A Coates arXiv preprint arXiv:1612.03226, 2016 | 81 | 2016 |
& Amodei, D.(2020) TB Brown, B Mann, N Ryder, M Subbiah, J Kaplan, P Dhariwal, ... Language models are few-shot learners, 1877-1901, 2005 | 77 | 2005 |
Language models are unsupervised multitask learners. OpenAI blog (2019) A Radford, J Wu, R Child, D Luan, D Amodei, I Sutskever URL: https://d4mucfpksywv. cloudfront. net/better-language-models/language …, 2019 | 75 | 2019 |