Resurrecting recurrent neural networks for long sequences A Orvieto, SL Smith, A Gu, A Fernando, C Gulcehre, R Pascanu, S De arXiv preprint arXiv:2303.06349, 2023 | 251 | 2023 |
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models S De, SL Smith, A Fernando, A Botev, G Cristian-Muraru, A Gu, R Haroun, ... arXiv preprint arXiv:2402.19427, 2024 | 87 | 2024 |
RecurrentGemma: Moving Past Transformers for Efficient Open Language Models A Botev, S De, SL Smith, A Fernando, GC Muraru, R Haroun, L Berrada, ... arXiv preprint arXiv:2404.07839, 2024 | 7 | 2024 |