Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
Sparse VideoGen: Accelerating Video Diffusion Transformers with Spatial-Temporal Sparsity
Diffusion Transformers (DiTs) dominate video generation but their high computational cost
severely limits real-world applicability, usually requiring tens of minutes to generate a few …
severely limits real-world applicability, usually requiring tens of minutes to generate a few …
Reconstruction vs. Generation: Taming Optimization Dilemma in Latent Diffusion Models
J Yao, X Wang - arxiv preprint arxiv:2501.01423, 2025 - arxiv.org
Latent diffusion models with Transformer architectures excel at generating high-fidelity
images. However, recent studies reveal an optimization dilemma in this two-stage design …
images. However, recent studies reveal an optimization dilemma in this two-stage design …
SnapGen: Taming High-Resolution Text-to-Image Models for Mobile Devices with Efficient Architectures and Training
Existing text-to-image (T2I) diffusion models face several limitations, including large model
sizes, slow runtime, and low-quality generation on mobile devices. This paper aims to …
sizes, slow runtime, and low-quality generation on mobile devices. This paper aims to …
SANA 1.5: Efficient Scaling of Training-Time and Inference-Time Compute in Linear Diffusion Transformer
This paper presents SANA-1.5, a linear Diffusion Transformer for efficient scaling in text-to-
image generation. Building upon SANA-1.0, we introduce three key innovations:(1) Efficient …
image generation. Building upon SANA-1.0, we introduce three key innovations:(1) Efficient …
TinyFusion: Diffusion Transformers Learned Shallow
Diffusion Transformers have demonstrated remarkable capabilities in image generation but
often come with excessive parameterization, resulting in considerable inference overhead in …
often come with excessive parameterization, resulting in considerable inference overhead in …
LiT: Delving into a Simplified Linear Diffusion Transformer for Image Generation
In commonly used sub-quadratic complexity modules, linear attention benefits from
simplicity and high parallelism, making it promising for image synthesis tasks. However, the …
simplicity and high parallelism, making it promising for image synthesis tasks. However, the …
Improving the Diffusability of Autoencoders
Latent diffusion models have emerged as the leading approach for generating high-quality
images and videos, utilizing compressed latent representations to reduce the computational …
images and videos, utilizing compressed latent representations to reduce the computational …
SnapGen-V: Generating a Five-Second Video within Five Seconds on a Mobile Device
We have witnessed the unprecedented success of diffusion-based video generation over
the past year. Recently proposed models from the community have wielded the power to …
the past year. Recently proposed models from the community have wielded the power to …
Mimir: Improving Video Diffusion Models for Precise Text Understanding
Text serves as the key control signal in video generation due to its narrative nature. To
render text descriptions into video clips, current video diffusion models borrow features from …
render text descriptions into video clips, current video diffusion models borrow features from …
Magic 1-For-1: Generating One Minute Video Clips within One Minute
In this technical report, we present Magic 1-For-1 (Magic141), an efficient video generation
model with optimized memory consumption and inference latency. The key idea is simple …
model with optimized memory consumption and inference latency. The key idea is simple …