Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
Transformers are ssms: Generalized models and efficient algorithms through structured state space duality
While Transformers have been the main architecture behind deep learning's success in
language modeling, state-space models (SSMs) such as Mamba have recently been shown …
language modeling, state-space models (SSMs) such as Mamba have recently been shown …
Gated linear attention transformers with hardware-efficient training
Transformers with linear attention allow for efficient parallel training but can simultaneously
be formulated as an RNN with 2D (matrix-valued) hidden states, thus enjoying linear-time …
be formulated as an RNN with 2D (matrix-valued) hidden states, thus enjoying linear-time …
The mamba in the llama: Distilling and accelerating hybrid models
Linear RNN architectures, like Mamba, can be competitive with Transformer models in
language modeling while having advantageous deployment characteristics. Given the focus …
language modeling while having advantageous deployment characteristics. Given the focus …
Hydra: Bidirectional state space models through generalized matrix mixers
A wide array of sequence models are built on a framework modeled after Transformers,
comprising alternating sequence mixer and channel mixer layers. This paper studies a …
comprising alternating sequence mixer and channel mixer layers. This paper studies a …
Samba: Simple hybrid state space models for efficient unlimited context language modeling
Efficiently modeling sequences with infinite context length has been a long-standing
problem. Past works suffer from either the quadratic computation complexity or the limited …
problem. Past works suffer from either the quadratic computation complexity or the limited …
MetaLA: Unified optimal linear approximation to softmax attention map
Various linear complexity models, such as Linear Transformer (LinFormer), State Space
Model (SSM), and Linear RNN (LinRNN), have been proposed to replace the conventional …
Model (SSM), and Linear RNN (LinRNN), have been proposed to replace the conventional …
How to train long-context language models (effectively)
We study continued training and supervised fine-tuning (SFT) of a language model (LM) to
make effective use of long-context information. We first establish a reliable evaluation …
make effective use of long-context information. We first establish a reliable evaluation …
Venturing into uncharted waters: The navigation compass from transformer to mamba
Y Zou, Y Chen, Z Li, L Zhang, H Zhao - arxiv preprint arxiv:2406.16722, 2024 - arxiv.org
Transformer, a deep neural network architecture, has long dominated the field of natural
language processing and beyond. Nevertheless, the recent introduction of Mamba …
language processing and beyond. Nevertheless, the recent introduction of Mamba …
Orchid: Flexible and data-dependent convolution for sequence modeling
In the rapidly evolving field of deep learning, the demand for models that are both
expressive and computationally efficient has never been more critical. This paper introduces …
expressive and computationally efficient has never been more critical. This paper introduces …
Just read twice: closing the recall gap for recurrent language models
Recurrent large language models that compete with Transformers in language modeling
perplexity are emerging at a rapid rate (eg, Mamba, RWKV). Excitingly, these architectures …
perplexity are emerging at a rapid rate (eg, Mamba, RWKV). Excitingly, these architectures …