- Academic Search

A Gu, T Dao - arxiv preprint arxiv:2312.00752, 2023 - arxiv.org

Foundation models, now powering most of the exciting applications in deep learning, are
almost universally based on the Transformer architecture and its core attention module …

Enregistrer Citer Cité 2055 fois Autres articles Les 7 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] arxiv.org

Vision mamba: Efficient visual representation learning with bidirectional state space model

L Zhu, B Liao, Q Zhang, X Wang, W Liu… - arxiv preprint arxiv …, 2024 - arxiv.org

Recently the state space models (SSMs) with efficient hardware-aware designs, ie, the
Mamba deep learning model, have shown great potential for long sequence modeling …

Enregistrer Citer Cité 1022 fois Autres articles Les 5 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] arxiv.org

Rwkv: Reinventing rnns for the transformer era

B Peng, E Alcaide, Q Anthony, A Albalak… - arxiv preprint arxiv …, 2023 - arxiv.org

Transformers have revolutionized almost all natural language processing (NLP) tasks but
suffer from memory and computational complexity that scales quadratically with sequence …

Enregistrer Citer Cité 439 fois Autres articles Les 9 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] mlr.press

Hyena hierarchy: Towards larger convolutional language models

M Poli, S Massaroli, E Nguyen, DY Fu… - International …, 2023 - proceedings.mlr.press

Recent advances in deep learning have relied heavily on the use of large Transformers due
to their ability to learn at scale. However, the core building block of Transformers, the …

Enregistrer Citer Cité 297 fois Autres articles Les 6 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] mlr.press

Resurrecting recurrent neural networks for long sequences

A Orvieto, SL Smith, A Gu, A Fernando… - International …, 2023 - proceedings.mlr.press

Abstract Recurrent Neural Networks (RNNs) offer fast inference on long sequences but are
hard to optimize and slow to train. Deep state-space models (SSMs) have recently been …

Enregistrer Citer Cité 255 fois Autres articles Les 7 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] biorxiv.org

Sequence modeling and design from molecular to genome scale with Evo

E Nguyen, M Poli, MG Durrant, B Kang, D Katrekar… - Science, 2024 - science.org

The genome is a sequence that encodes the DNA, RNA, and proteins that orchestrate an
organism's function. We present Evo, a long-context genomic foundation model with a …

Enregistrer Citer Cité 91 fois Autres articles Les 3 versions Free GPT-4

[Free GPT-4]

[PDF] arxiv.org

Efficient large language models: A survey

Z Wan, X Wang, C Liu, S Alam, Y Zheng, J Liu… - arxiv preprint arxiv …, 2023 - arxiv.org

Large Language Models (LLMs) have demonstrated remarkable capabilities in important
tasks such as natural language understanding and language generation, and thus have the …

Enregistrer Citer Cité 126 fois Autres articles Les 7 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] arxiv.org

Hungry hungry hippos: Towards language modeling with state space models

DY Fu, T Dao, KK Saab, AW Thomas, A Rudra… - arxiv preprint arxiv …, 2022 - arxiv.org

State space models (SSMs) have demonstrated state-of-the-art sequence modeling
performance in some modalities, but underperform attention in language modeling …

Enregistrer Citer Cité 436 fois Autres articles Les 4 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] neurips.cc

On the parameterization and initialization of diagonal state space models

A Gu, K Goel, A Gupta, C Ré - Advances in Neural …, 2022 - proceedings.neurips.cc

State space models (SSM) have recently been shown to be very effective as a deep learning
layer as a promising alternative to sequence models such as RNNs, CNNs, or Transformers …

Enregistrer Citer Cité 318 fois Autres articles Les 8 versions Free GPT-4 Version HTML

[Free GPT-4]

[PDF] neurips.cc

S4nd: Modeling images and videos as multidimensional signals with state spaces

E Nguyen, K Goel, A Gu, G Downs… - Advances in neural …, 2022 - proceedings.neurips.cc

Visual data such as images and videos are typically modeled as discretizations of inherently
continuous, multidimensional signals. Existing continuous-signal models attempt to exploit …

Enregistrer Citer Cité 197 fois Autres articles Les 6 versions Free GPT-4 Version HTML

Créer l'alerte

Citer

Recherche avancée

Enregistré dans Ma bibliothèque

Diagonal state spaces are as effective as structured state spaces

Mamba: Linear-time sequence modeling with selective state spaces

Vision mamba: Efficient visual representation learning with bidirectional state space model

Rwkv: Reinventing rnns for the transformer era

Hyena hierarchy: Towards larger convolutional language models

Resurrecting recurrent neural networks for long sequences

Sequence modeling and design from molecular to genome scale with Evo

Efficient large language models: A survey

Hungry hungry hippos: Towards language modeling with state space models

On the parameterization and initialization of diagonal state space models

S4nd: Modeling images and videos as multidimensional signals with state spaces