Simplified state space layers for sequence modeling

JTH Smith, A Warrington, SW Linderman - arxiv preprint arxiv:2208.04933, 2022 - arxiv.org
Models using structured state space sequence (S4) layers have achieved state-of-the-art
performance on long-range sequence modeling tasks. An S4 layer combines linear state …

Monarch mixer: A simple sub-quadratic gemm-based architecture

D Fu, S Arora, J Grogan, I Johnson… - Advances in …, 2024 - proceedings.neurips.cc
Abstract Machine learning models are increasingly being scaled in both sequence length
and model dimension to reach longer contexts and better performance. However, existing …

Mega: moving average equipped gated attention

X Ma, C Zhou, X Kong, J He, L Gui, G Neubig… - arxiv preprint arxiv …, 2022 - arxiv.org
The design choices in the Transformer attention mechanism, including weak inductive bias
and quadratic computational complexity, have limited its application for modeling long …

More convnets in the 2020s: Scaling up kernels beyond 51x51 using sparsity

S Liu, T Chen, X Chen, X Chen, Q **ao, B Wu… - arxiv preprint arxiv …, 2022 - arxiv.org
Transformers have quickly shined in the computer vision world since the emergence of
Vision Transformers (ViTs). The dominant role of convolutional neural networks (CNNs) …

Towards multi-spatiotemporal-scale generalized pde modeling

JK Gupta, J Brandstetter - arxiv preprint arxiv:2209.15616, 2022 - arxiv.org
Partial differential equations (PDEs) are central to describing complex physical system
simulations. Their expensive solution techniques have led to an increased interest in deep …

Convolutional networks with oriented 1d kernels

A Kirchmeyer, J Deng - Proceedings of the IEEE/CVF …, 2023 - openaccess.thecvf.com
In computer vision, 2D convolution is arguably the most important operation performed by a
ConvNet. Unsurprisingly, it has been the focus of intense software and hardware …

Learning long sequences in spiking neural networks

MI Stan, O Rhodes - Scientific Reports, 2024 - nature.com
Spiking neural networks (SNNs) take inspiration from the brain to enable energy-efficient
computations. Since the advent of Transformers, SNNs have struggled to compete with …

Transformers significantly improve splice site prediction

BA Jónsson, GH Halldórsson, S Árdal… - Communications …, 2024 - nature.com
Mutations that affect RNA splicing significantly impact human diversity and disease. Here we
present a method using transformers, a type of machine learning model, to detect splicing …

QuadConv: Quadrature-based convolutions with applications to non-uniform PDE data compression

K Doherty, C Simpson, S Becker, A Doostan - Journal of Computational …, 2024 - Elsevier
We present a new convolution layer for deep learning architectures which we call
QuadConv—an approximation to continuous convolution via quadrature. Our operator is …

Dnarch: Learning convolutional neural architectures by backpropagation

DW Romero, N Zeghidour - arxiv preprint arxiv:2302.05400, 2023 - arxiv.org
We present Differentiable Neural Architectures (DNArch), a method that jointly learns the
weights and the architecture of Convolutional Neural Networks (CNNs) by backpropagation …