- Academic Search

R Xu, S Yang, Y Wang, B Du, H Chen - arxiv preprint arxiv:2404.18861, 2024 - arxiv.org

Mamba, a recent selective structured state space model, performs excellently on long
sequence modeling tasks. Mamba mitigates the modeling constraints of convolutional …

Simpan Kutip Dirujuk 61 kali Artikel terkait 2 versi Versi HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Mamba-360: Survey of state space models as transformer alternative for long sequence modelling: Methods, applications, and challenges

BN Patro, VS Agneeswaran - arxiv preprint arxiv:2404.16112, 2024 - arxiv.org

Sequence modeling is a crucial area across various domains, including Natural Language
Processing (NLP), speech recognition, time series forecasting, music generation, and …

Simpan Kutip Dirujuk 53 kali Artikel terkait Versi HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Jamba: A hybrid transformer-mamba language model

O Lieber, B Lenz, H Bata, G Cohen, J Osin… - arxiv preprint arxiv …, 2024 - arxiv.org

We present Jamba, a new base large language model based on a novel hybrid Transformer-
Mamba mixture-of-experts (MoE) architecture. Specifically, Jamba interleaves blocks of …

Simpan Kutip Dirujuk 174 kali Artikel terkait 2 versi Versi HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

A survey of mamba

H Qu, L Ning, R An, W Fan, T Derr, H Liu, X Xu… - arxiv preprint arxiv …, 2024 - arxiv.org

As one of the most representative DL techniques, Transformer architecture has empowered
numerous advanced models, especially the large language models (LLMs) that comprise …

Simpan Kutip Dirujuk 22 kali Artikel terkait 3 versi Versi HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Transformers are SSMs: Generalized models and efficient algorithms through structured state space duality

T Dao, A Gu - arxiv preprint arxiv:2405.21060, 2024 - arxiv.org

While Transformers have been the main architecture behind deep learning's success in
language modeling, state-space models (SSMs) such as Mamba have recently been shown …

Simpan Kutip Dirujuk 305 kali Artikel terkait 3 versi Versi HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Repeat after me: Transformers are better than state space models at copying

S Jelassi, D Brandfonbrener, SM Kakade… - arxiv preprint arxiv …, 2024 - arxiv.org

Transformers are the dominant architecture for sequence modeling, but there is growing
interest in models that use a fixed-size latent state that does not depend on the sequence …

Simpan Kutip Dirujuk 61 kali Artikel terkait 3 versi Versi HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Integration of Mamba and Transformer-MAT for Long-Short Range Time Series Forecasting with Application to Weather Dynamics

W Zhang, J Huang, R Wang, C Wei… - 2024 International …, 2024 - ieeexplore.ieee.org

Long-short range time series forecasting is essential for predicting future trends and patterns
over extended periods. While deep learning models such as Transformers have made …

Simpan Kutip Dirujuk 18 kali Artikel terkait 3 versi

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Is mamba capable of in-context learning?

R Grazzi, J Siems, S Schrodi, T Brox… - arxiv preprint arxiv …, 2024 - arxiv.org

This work provides empirical evidence that Mamba, a newly proposed selective structured
state space model, has similar in-context learning (ICL) capabilities as transformers. We …

Simpan Kutip Dirujuk 37 kali Artikel terkait 3 versi Versi HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Pointrwkv: Efficient rwkv-like model for hierarchical point cloud learning

Q He, J Zhang, J Peng, H He, X Li, Y Wang… - arxiv preprint arxiv …, 2024 - arxiv.org

Transformers have revolutionized the point cloud learning task, but the quadratic complexity
hinders its extension to long sequence and makes a burden on limited computational …

Simpan Kutip Dirujuk 11 kali Artikel terkait 2 versi Versi HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

A survey on efficient inference for large language models

Z Zhou, X Ning, K Hong, T Fu, J Xu, S Li, Y Lou… - arxiv preprint arxiv …, 2024 - arxiv.org

Large Language Models (LLMs) have attracted extensive attention due to their remarkable
performance across various tasks. However, the substantial computational and memory …

Simpan Kutip Dirujuk 74 kali Artikel terkait 5 versi Versi HTML

Buat notifikasi

Kutip

Penelusuran lanjutan

Disimpan ke Koleksi saya

Can mamba learn how to learn? a comparative study on in-context learning tasks

A survey on vision mamba: Models, applications and challenges

Mamba-360: Survey of state space models as transformer alternative for long sequence modelling: Methods, applications, and challenges

Jamba: A hybrid transformer-mamba language model

A survey of mamba

Transformers are SSMs: Generalized models and efficient algorithms through structured state space duality

Repeat after me: Transformers are better than state space models at copying

Integration of Mamba and Transformer-MAT for Long-Short Range Time Series Forecasting with Application to Weather Dynamics

Is mamba capable of in-context learning?

Pointrwkv: Efficient rwkv-like model for hierarchical point cloud learning

A survey on efficient inference for large language models