Google Tudós

S Yang, B Wang, Y Zhang, Y Shen, Y Kim - arxiv preprint arxiv …, 2024 - arxiv.org

Transformers with linear attention (ie, linear transformers) and state-space models have
recently been suggested as a viable linear-time alternative to transformers with softmax …

Mentés Hivatkozás Idézetek száma: 31 Kapcsolódó cikkek Mind a(z) 4 változat HTML-változat

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

ARFlow: Autogressive Flow with Hybrid Linear Attention

M Hui, RJ Zhu, S Yang, Y Zhang, Z Wang… - arxiv preprint arxiv …, 2025 - arxiv.org

Flow models are effective at progressively generating realistic images, but they generally
struggle to capture long-range dependencies during the generation process as they …

Mentés Hivatkozás Kapcsolódó cikkek Mind a(z) 2 változat HTML-változat

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

CLEAR: Conv-Like Linearization Revs Pre-Trained Diffusion Transformers Up

S Liu, Z Tan, X Wang - arxiv preprint arxiv:2412.16112, 2024 - arxiv.org

Diffusion Transformers (DiT) have become a leading architecture in image generation.
However, the quadratic complexity of attention mechanisms, which are responsible for …

Mentés Hivatkozás Kapcsolódó cikkek Mind a(z) 2 változat HTML-változat

[Free GPT-4]
[DeepSeek]

[PDF] openreview.net

Forgetting Transformer: Softmax Attention with a Forget Gate

Z Lin, E Nikishin, X He, A Courville - The Thirteenth International … - openreview.net

An essential component of modern recurrent sequence models is the* forget gate*. While
Transformers do not have an explicit recurrent form, we show that a forget gate can be …

Mentés Hivatkozás Kapcsolódó cikkek HTML-változat

[Free GPT-4]
[DeepSeek]

[PDF] openreview.net

FlashSampling: Fast and Memory-Efficient Exact Sampling with Group-Gumbel-Max

Z Qin, X Shen, Y Zhang, Y Zhong - openreview.net

Sampling operations in discrete space are widely used in different fields such as language
models, reinforcement learning, VAE, GAN, and neural architecture search. Current …

Mentés Hivatkozás Kapcsolódó cikkek HTML-változat

Értesítés létrehozása

Hivatkozás

Speciális keresés

Mentve a Saját könyvtárba

Gated slot attention for efficient linear-time sequence modeling

Parallelizing linear transformers with the delta rule over sequence length

ARFlow: Autogressive Flow with Hybrid Linear Attention

CLEAR: Conv-Like Linearization Revs Pre-Trained Diffusion Transformers Up

Forgetting Transformer: Softmax Attention with a Forget Gate

FlashSampling: Fast and Memory-Efficient Exact Sampling with Group-Gumbel-Max