Scaling up your kernels to 31x31: Revisiting large kernel design in cnns
We revisit large kernel design in modern convolutional neural networks (CNNs). Inspired by
recent advances in vision transformers (ViTs), in this paper, we demonstrate that using a few …
recent advances in vision transformers (ViTs), in this paper, we demonstrate that using a few …
Efficiently modeling long sequences with structured state spaces
A central goal of sequence modeling is designing a single principled model that can
address sequence data across a range of modalities and tasks, particularly on long-range …
address sequence data across a range of modalities and tasks, particularly on long-range …
Hyena hierarchy: Towards larger convolutional language models
Recent advances in deep learning have relied heavily on the use of large Transformers due
to their ability to learn at scale. However, the core building block of Transformers, the …
to their ability to learn at scale. However, the core building block of Transformers, the …
Hyenadna: Long-range genomic sequence modeling at single nucleotide resolution
Genomic (DNA) sequences encode an enormous amount of information for gene regulation
and protein synthesis. Similar to natural language models, researchers have proposed …
and protein synthesis. Similar to natural language models, researchers have proposed …
On the parameterization and initialization of diagonal state space models
State space models (SSM) have recently been shown to be very effective as a deep learning
layer as a promising alternative to sequence models such as RNNs, CNNs, or Transformers …
layer as a promising alternative to sequence models such as RNNs, CNNs, or Transformers …
Combining recurrent, convolutional, and continuous-time models with linear state space layers
Recurrent neural networks (RNNs), temporal convolutions, and neural differential equations
(NDEs) are popular families of deep learning models for time-series data, each with unique …
(NDEs) are popular families of deep learning models for time-series data, each with unique …
Simplified state space layers for sequence modeling
Models using structured state space sequence (S4) layers have achieved state-of-the-art
performance on long-range sequence modeling tasks. An S4 layer combines linear state …
performance on long-range sequence modeling tasks. An S4 layer combines linear state …
S4nd: Modeling images and videos as multidimensional signals with state spaces
Visual data such as images and videos are typically modeled as discretizations of inherently
continuous, multidimensional signals. Existing continuous-signal models attempt to exploit …
continuous, multidimensional signals. Existing continuous-signal models attempt to exploit …
Monarch mixer: A simple sub-quadratic gemm-based architecture
Abstract Machine learning models are increasingly being scaled in both sequence length
and model dimension to reach longer contexts and better performance. However, existing …
and model dimension to reach longer contexts and better performance. However, existing …
Mega: moving average equipped gated attention
The design choices in the Transformer attention mechanism, including weak inductive bias
and quadratic computational complexity, have limited its application for modeling long …
and quadratic computational complexity, have limited its application for modeling long …