Deep compression autoencoder for efficient high-resolution diffusion models

J Chen, H Cai, J Chen, E **e, S Yang, H Tang… - arxiv preprint arxiv …, 2024 - arxiv.org
We present Deep Compression Autoencoder (DC-AE), a new family of autoencoder models
for accelerating high-resolution diffusion models. Existing autoencoder models have …

Svdqunat: Absorbing outliers by low-rank components for 4-bit diffusion models

M Li, Y Lin, Z Zhang, T Cai, X Li, J Guo, E **e… - arxiv preprint arxiv …, 2024 - arxiv.org
Diffusion models have been proven highly effective at generating high-quality images.
However, as these models grow larger, they require significantly more memory and suffer …

Accelerating auto-regressive text-to-image generation with training-free speculative jacobi decoding

Y Teng, H Shi, X Liu, X Ning, G Dai, Y Wang… - arxiv preprint arxiv …, 2024 - arxiv.org
The current large auto-regressive models can generate high-quality, high-resolution images,
but these models require hundreds or even thousands of steps of next-token prediction …

Distillation-free one-step diffusion for real-world image super-resolution

J Li, J Cao, Z Zou, X Su, X Yuan, Y Zhang… - arxiv preprint arxiv …, 2024 - arxiv.org
Diffusion models have been achieving excellent performance for real-world image super-
resolution (Real-ISR) with considerable computational costs. Current approaches are trying …

Rectified diffusion: Straightness is not your need in rectified flow

FY Wang, L Yang, Z Huang, M Wang, H Li - arxiv preprint arxiv …, 2024 - arxiv.org
Diffusion models have greatly improved visual generation but are hindered by slow
generation speed due to the computationally intensive nature of solving generative ODEs …

One step diffusion via shortcut models

K Frans, D Hafner, S Levine, P Abbeel - arxiv preprint arxiv:2410.12557, 2024 - arxiv.org
Diffusion models and flow-matching models have enabled generating diverse and realistic
images by learning to transfer noise to data. However, sampling from these models involves …

Towards Controllable Speech Synthesis in the Era of Large Language Models: A Survey

T **e, Y Rong, P Zhang, L Liu - arxiv preprint arxiv:2412.06602, 2024 - arxiv.org
Text-to-speech (TTS), also known as speech synthesis, is a prominent research area that
aims to generate natural-sounding human speech from text. Recently, with the increasing …

Simpler diffusion (sid2): 1.5 fid on imagenet512 with pixel-space diffusion

E Hoogeboom, T Mensink, J Heek, K Lamerigts… - arxiv preprint arxiv …, 2024 - arxiv.org
Latent diffusion models have become the popular choice for scaling up diffusion models for
high resolution image synthesis. Compared to pixel-space models that are trained end-to …

Flash diffusion: Accelerating any conditional diffusion model for few steps image generation

C Chadebec, O Tasar, E Benaroche… - arxiv preprint arxiv …, 2024 - arxiv.org
In this paper, we propose an efficient, fast, and versatile distillation method to accelerate the
generation of pre-trained diffusion models: Flash Diffusion. The method reaches state-of-the …

Diffusion Adversarial Post-Training for One-Step Video Generation

S Lin, X **a, Y Ren, C Yang, X **ao, L Jiang - arxiv preprint arxiv …, 2025 - arxiv.org
The diffusion models are widely used for image and video generation, but their iterative
generation process is slow and expansive. While existing distillation approaches have …