Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
Transformers are ssms: Generalized models and efficient algorithms through structured state space duality
While Transformers have been the main architecture behind deep learning's success in
language modeling, state-space models (SSMs) such as Mamba have recently been shown …
language modeling, state-space models (SSMs) such as Mamba have recently been shown …
Gated linear attention transformers with hardware-efficient training
Transformers with linear attention allow for efficient parallel training but can simultaneously
be formulated as an RNN with 2D (matrix-valued) hidden states, thus enjoying linear-time …
be formulated as an RNN with 2D (matrix-valued) hidden states, thus enjoying linear-time …
xlstm: Extended long short-term memory
In the 1990s, the constant error carousel and gating were introduced as the central ideas of
the Long Short-Term Memory (LSTM). Since then, LSTMs have stood the test of time and …
the Long Short-Term Memory (LSTM). Since then, LSTMs have stood the test of time and …
Learning to (learn at test time): Rnns with expressive hidden states
Self-attention performs well in long context but has quadratic complexity. Existing RNN
layers have linear complexity, but their performance in long context is limited by the …
layers have linear complexity, but their performance in long context is limited by the …
Zigma: A dit-style zigzag mamba diffusion model
The diffusion model has long been plagued by scalability and quadratic complexity issues,
especially within transformer-based structures. In this study, we aim to leverage the long …
especially within transformer-based structures. In this study, we aim to leverage the long …
Scaling laws for precision
Low precision training and inference affect both the quality and cost of language models, but
current scaling laws do not account for this. In this work, we devise" precision-aware" scaling …
current scaling laws do not account for this. In this work, we devise" precision-aware" scaling …
Linfusion: 1 gpu, 1 minute, 16k image
Modern diffusion models, particularly those utilizing a Transformer-based UNet for
denoising, rely heavily on self-attention operations to manage complex spatial relationships …
denoising, rely heavily on self-attention operations to manage complex spatial relationships …
Mamba or rwkv: Exploring high-quality and high-efficiency segment anything model
Transformer-based segmentation methods face the challenge of efficient inference when
dealing with high-resolution images. Recently, several linear attention architectures, such as …
dealing with high-resolution images. Recently, several linear attention architectures, such as …
Autoregressive pretraining with mamba in vision
The vision community has started to build with the recently developed state space model,
Mamba, as the new backbone for a range of tasks. This paper shows that Mamba's visual …
Mamba, as the new backbone for a range of tasks. This paper shows that Mamba's visual …
Pointrwkv: Efficient rwkv-like model for hierarchical point cloud learning
Transformers have revolutionized the point cloud learning task, but the quadratic complexity
hinders its extension to long sequence and makes a burden on limited computational …
hinders its extension to long sequence and makes a burden on limited computational …