- Academic Search

Transformers are ssms: Generalized models and efficient algorithms through structured state space duality

T Dao, A Gu - arxiv preprint arxiv:2405.21060, 2024 - arxiv.org

While Transformers have been the main architecture behind deep learning's success in
language modeling, state-space models (SSMs) such as Mamba have recently been shown …

บันทึก อ้างอิง อ้างโดย325 บทความที่เกี่ยวข้อง ทั้งหมด 6 ฉบับ ดูในรูปแบบ HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Gated linear attention transformers with hardware-efficient training

S Yang, B Wang, Y Shen, R Panda, Y Kim - arxiv preprint arxiv …, 2023 - arxiv.org

Transformers with linear attention allow for efficient parallel training but can simultaneously
be formulated as an RNN with 2D (matrix-valued) hidden states, thus enjoying linear-time …

บันทึก อ้างอิง อ้างโดย112 บทความที่เกี่ยวข้อง ทั้งหมด 9 ฉบับ ดูในรูปแบบ HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

xlstm: Extended long short-term memory

M Beck, K Pöppel, M Spanring, A Auer… - arxiv preprint arxiv …, 2024 - arxiv.org

In the 1990s, the constant error carousel and gating were introduced as the central ideas of
the Long Short-Term Memory (LSTM). Since then, LSTMs have stood the test of time and …

บันทึก อ้างอิง อ้างโดย108 บทความที่เกี่ยวข้อง ทั้งหมด 6 ฉบับ ดูในรูปแบบ HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Learning to (learn at test time): Rnns with expressive hidden states

Y Sun, X Li, K Dalal, J Xu, A Vikram, G Zhang… - arxiv preprint arxiv …, 2024 - arxiv.org

Self-attention performs well in long context but has quadratic complexity. Existing RNN
layers have linear complexity, but their performance in long context is limited by the …

บันทึก อ้างอิง อ้างโดย53 บทความที่เกี่ยวข้อง ทั้งหมด 2 ฉบับ ดูในรูปแบบ HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Zigma: A dit-style zigzag mamba diffusion model

VT Hu, SA Baumann, M Gui, O Grebenkova… - … on Computer Vision, 2024 - Springer

The diffusion model has long been plagued by scalability and quadratic complexity issues,
especially within transformer-based structures. In this study, we aim to leverage the long …

บันทึก อ้างอิง อ้างโดย37 บทความที่เกี่ยวข้อง ทั้งหมด 7 ฉบับ

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Scaling laws for precision

T Kumar, Z Ankner, BF Spector, B Bordelon… - arxiv preprint arxiv …, 2024 - arxiv.org

Low precision training and inference affect both the quality and cost of language models, but
current scaling laws do not account for this. In this work, we devise" precision-aware" scaling …

บันทึก อ้างอิง อ้างโดย19 บทความที่เกี่ยวข้อง ทั้งหมด 3 ฉบับ ดูในรูปแบบ HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Linfusion: 1 gpu, 1 minute, 16k image

S Liu, W Yu, Z Tan, X Wang - arxiv preprint arxiv:2409.02097, 2024 - arxiv.org

Modern diffusion models, particularly those utilizing a Transformer-based UNet for
denoising, rely heavily on self-attention operations to manage complex spatial relationships …

บันทึก อ้างอิง อ้างโดย8 บทความที่เกี่ยวข้อง ทั้งหมด 3 ฉบับ ดูในรูปแบบ HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Mamba or rwkv: Exploring high-quality and high-efficiency segment anything model

H Yuan, X Li, L Qi, T Zhang, MH Yang, S Yan… - arxiv preprint arxiv …, 2024 - arxiv.org

Transformer-based segmentation methods face the challenge of efficient inference when
dealing with high-resolution images. Recently, several linear attention architectures, such as …

บันทึก อ้างอิง อ้างโดย11 บทความที่เกี่ยวข้อง ทั้งหมด 3 ฉบับ ดูในรูปแบบ HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Autoregressive pretraining with mamba in vision

S Ren, X Li, H Tu, F Wang, F Shu, L Zhang… - arxiv preprint arxiv …, 2024 - arxiv.org

The vision community has started to build with the recently developed state space model,
Mamba, as the new backbone for a range of tasks. This paper shows that Mamba's visual …

บันทึก อ้างอิง อ้างโดย8 บทความที่เกี่ยวข้อง ทั้งหมด 3 ฉบับ ดูในรูปแบบ HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Pointrwkv: Efficient rwkv-like model for hierarchical point cloud learning

Q He, J Zhang, J Peng, H He, X Li, Y Wang… - arxiv preprint arxiv …, 2024 - arxiv.org

Transformers have revolutionized the point cloud learning task, but the quadratic complexity
hinders its extension to long sequence and makes a burden on limited computational …

บันทึก อ้างอิง อ้างโดย11 บทความที่เกี่ยวข้อง ทั้งหมด 4 ฉบับ ดูในรูปแบบ HTML

สร้างการแจ้งเตือน

อ้างอิง

การค้นหาขั้นสูง

บันทึกไปยังคลังของฉันแล้ว

Eagle and finch: Rwkv with matrix-valued states and dynamic recurrence

Transformers are ssms: Generalized models and efficient algorithms through structured state space duality

Gated linear attention transformers with hardware-efficient training

xlstm: Extended long short-term memory

Learning to (learn at test time): Rnns with expressive hidden states

Zigma: A dit-style zigzag mamba diffusion model

Scaling laws for precision

Linfusion: 1 gpu, 1 minute, 16k image

Mamba or rwkv: Exploring high-quality and high-efficiency segment anything model

Autoregressive pretraining with mamba in vision

Pointrwkv: Efficient rwkv-like model for hierarchical point cloud learning