Deep compression autoencoder for efficient high-resolution diffusion models
We present Deep Compression Autoencoder (DC-AE), a new family of autoencoder models
for accelerating high-resolution diffusion models. Existing autoencoder models have …
for accelerating high-resolution diffusion models. Existing autoencoder models have …
Svdqunat: Absorbing outliers by low-rank components for 4-bit diffusion models
Diffusion models have been proven highly effective at generating high-quality images.
However, as these models grow larger, they require significantly more memory and suffer …
However, as these models grow larger, they require significantly more memory and suffer …
Accelerating auto-regressive text-to-image generation with training-free speculative jacobi decoding
The current large auto-regressive models can generate high-quality, high-resolution images,
but these models require hundreds or even thousands of steps of next-token prediction …
but these models require hundreds or even thousands of steps of next-token prediction …
Distillation-free one-step diffusion for real-world image super-resolution
J Li, J Cao, Z Zou, X Su, X Yuan, Y Zhang… - arxiv preprint arxiv …, 2024 - arxiv.org
Diffusion models have been achieving excellent performance for real-world image super-
resolution (Real-ISR) with considerable computational costs. Current approaches are trying …
resolution (Real-ISR) with considerable computational costs. Current approaches are trying …
Rectified diffusion: Straightness is not your need in rectified flow
Diffusion models have greatly improved visual generation but are hindered by slow
generation speed due to the computationally intensive nature of solving generative ODEs …
generation speed due to the computationally intensive nature of solving generative ODEs …
One step diffusion via shortcut models
Diffusion models and flow-matching models have enabled generating diverse and realistic
images by learning to transfer noise to data. However, sampling from these models involves …
images by learning to transfer noise to data. However, sampling from these models involves …
Towards Controllable Speech Synthesis in the Era of Large Language Models: A Survey
T **e, Y Rong, P Zhang, L Liu - arxiv preprint arxiv:2412.06602, 2024 - arxiv.org
Text-to-speech (TTS), also known as speech synthesis, is a prominent research area that
aims to generate natural-sounding human speech from text. Recently, with the increasing …
aims to generate natural-sounding human speech from text. Recently, with the increasing …
Simpler diffusion (sid2): 1.5 fid on imagenet512 with pixel-space diffusion
Latent diffusion models have become the popular choice for scaling up diffusion models for
high resolution image synthesis. Compared to pixel-space models that are trained end-to …
high resolution image synthesis. Compared to pixel-space models that are trained end-to …
Flash diffusion: Accelerating any conditional diffusion model for few steps image generation
In this paper, we propose an efficient, fast, and versatile distillation method to accelerate the
generation of pre-trained diffusion models: Flash Diffusion. The method reaches state-of-the …
generation of pre-trained diffusion models: Flash Diffusion. The method reaches state-of-the …
Diffusion Adversarial Post-Training for One-Step Video Generation
The diffusion models are widely used for image and video generation, but their iterative
generation process is slow and expansive. While existing distillation approaches have …
generation process is slow and expansive. While existing distillation approaches have …