- Academic Search

X Ding, X Zhang, J Han, G Ding - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com

We revisit large kernel design in modern convolutional neural networks (CNNs). Inspired by
recent advances in vision transformers (ViTs), in this paper, we demonstrate that using a few …

Save Cite Cited by 1126 Related articles All 10 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Efficiently modeling long sequences with structured state spaces

A Gu, K Goel, C Ré - arxiv preprint arxiv:2111.00396, 2021 - arxiv.org

A central goal of sequence modeling is designing a single principled model that can
address sequence data across a range of modalities and tasks, particularly on long-range …

Save Cite Cited by 1644 Related articles All 3 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] mlr.press

Hyena hierarchy: Towards larger convolutional language models

M Poli, S Massaroli, E Nguyen, DY Fu… - International …, 2023 - proceedings.mlr.press

Recent advances in deep learning have relied heavily on the use of large Transformers due
to their ability to learn at scale. However, the core building block of Transformers, the …

Save Cite Cited by 297 Related articles All 6 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] neurips.cc

Hyenadna: Long-range genomic sequence modeling at single nucleotide resolution

E Nguyen, M Poli, M Faizi, A Thomas… - Advances in neural …, 2024 - proceedings.neurips.cc

Genomic (DNA) sequences encode an enormous amount of information for gene regulation
and protein synthesis. Similar to natural language models, researchers have proposed …

Save Cite Cited by 238 Related articles All 8 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] neurips.cc

On the parameterization and initialization of diagonal state space models

A Gu, K Goel, A Gupta, C Ré - Advances in Neural …, 2022 - proceedings.neurips.cc

State space models (SSM) have recently been shown to be very effective as a deep learning
layer as a promising alternative to sequence models such as RNNs, CNNs, or Transformers …

Save Cite Cited by 318 Related articles All 8 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] neurips.cc

Combining recurrent, convolutional, and continuous-time models with linear state space layers

A Gu, I Johnson, K Goel, K Saab… - Advances in neural …, 2021 - proceedings.neurips.cc

Recurrent neural networks (RNNs), temporal convolutions, and neural differential equations
(NDEs) are popular families of deep learning models for time-series data, each with unique …

Save Cite Cited by 534 Related articles All 8 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Simplified state space layers for sequence modeling

JTH Smith, A Warrington, SW Linderman - arxiv preprint arxiv:2208.04933, 2022 - arxiv.org

Models using structured state space sequence (S4) layers have achieved state-of-the-art
performance on long-range sequence modeling tasks. An S4 layer combines linear state …

Save Cite Cited by 470 Related articles All 3 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] neurips.cc

S4nd: Modeling images and videos as multidimensional signals with state spaces

E Nguyen, K Goel, A Gu, G Downs… - Advances in neural …, 2022 - proceedings.neurips.cc

Visual data such as images and videos are typically modeled as discretizations of inherently
continuous, multidimensional signals. Existing continuous-signal models attempt to exploit …

Save Cite Cited by 197 Related articles All 6 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] neurips.cc

Monarch mixer: A simple sub-quadratic gemm-based architecture

D Fu, S Arora, J Grogan, I Johnson… - Advances in …, 2024 - proceedings.neurips.cc

Abstract Machine learning models are increasingly being scaled in both sequence length
and model dimension to reach longer contexts and better performance. However, existing …

Save Cite Cited by 46 Related articles All 6 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Mega: moving average equipped gated attention

X Ma, C Zhou, X Kong, J He, L Gui, G Neubig… - arxiv preprint arxiv …, 2022 - arxiv.org

The design choices in the Transformer attention mechanism, including weak inductive bias
and quadratic computational complexity, have limited its application for modeling long …

Save Cite Cited by 150 Related articles All 3 versions Free GPT-4 View as HTML

Create alert

Cite

Advanced search

Saved to My library

Flexconv: Continuous kernel convolutions with differentiable kernel sizes

Scaling up your kernels to 31x31: Revisiting large kernel design in cnns

Efficiently modeling long sequences with structured state spaces

Hyena hierarchy: Towards larger convolutional language models

Hyenadna: Long-range genomic sequence modeling at single nucleotide resolution

On the parameterization and initialization of diagonal state space models

Combining recurrent, convolutional, and continuous-time models with linear state space layers

Simplified state space layers for sequence modeling

S4nd: Modeling images and videos as multidimensional signals with state spaces

Monarch mixer: A simple sub-quadratic gemm-based architecture

Mega: moving average equipped gated attention