[PDF][PDF] Mamba: Linear-time sequence modeling with selective state spaces

A Gu, T Dao - arxiv preprint arxiv:2312.00752, 2023 - minjiazhang.github.io
Foundation models, now powering most of the exciting applications in deep learning, are
almost universally based on the Transformer architecture and its core attention module …

Transformers are ssms: Generalized models and efficient algorithms through structured state space duality

T Dao, A Gu - arxiv preprint arxiv:2405.21060, 2024 - arxiv.org
While Transformers have been the main architecture behind deep learning's success in
language modeling, state-space models (SSMs) such as Mamba have recently been shown …

Simplified state space layers for sequence modeling

JTH Smith, A Warrington, SW Linderman - arxiv preprint arxiv:2208.04933, 2022 - arxiv.org
Models using structured state space sequence (S4) layers have achieved state-of-the-art
performance on long-range sequence modeling tasks. An S4 layer combines linear state …

State space models for event cameras

N Zubic, M Gehrig… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
Today state-of-the-art deep neural networks that process event-camera data first convert a
temporal window of events into dense grid-like input representations. As such they exhibit …

The hidden attention of mamba models

A Ali, I Zimerman, L Wolf - arxiv preprint arxiv:2403.01590, 2024 - arxiv.org
The Mamba layer offers an efficient selective state space model (SSM) that is highly effective
in modeling multiple domains, including NLP, long-range sequence processing, and …

The illusion of state in state-space models

W Merrill, J Petty, A Sabharwal - arxiv preprint arxiv:2404.08819, 2024 - arxiv.org
State-space models (SSMs) have emerged as a potential alternative architecture for building
large language models (LLMs) compared to the previously ubiquitous transformer …

Convolutional state space models for long-range spatiotemporal modeling

J Smith, S De Mello, J Kautz… - Advances in Neural …, 2023 - proceedings.neurips.cc
Effectively modeling long spatiotemporal sequences is challenging due to the need to model
complex spatial correlations and long-range temporal dependencies simultaneously …

[PDF][PDF] Eagle and finch: Rwkv with matrix-valued states and dynamic recurrence

B Peng, D Goldstein, Q Anthony, A Albalak… - arxiv preprint arxiv …, 2024 - openreview.net
Abstract We present Eagle (RWKV-5) and Finch (RWKV-6), sequence models improving
upon the RWKV (RWKV-4)(Peng et al., 2023) architecture. Our architectural design …

{GraphChi}:{Large-Scale} graph computation on just a {PC}

A Kyrola, G Blelloch, C Guestrin - 10th USENIX symposium on operating …, 2012 - usenix.org
Current systems for graph computation require a distributed computing cluster to handle
very large real-world problems, such as analysis on social networks or the web graph. While …

[KNYGA][B] Structured parallel programming: patterns for efficient computation

M McCool, J Reinders, A Robison - 2012 - books.google.com
Structured Parallel Programming offers the simplest way for developers to learn patterns for
high-performance parallel programming. Written by parallel computing experts and industry …