Diffusion forcing: Next-token prediction meets full-sequence diffusion
This paper presents Diffusion Forcing, a new training paradigm where a diffusion model is
trained to denoise a set of tokens with independent per-token noise levels. We apply …
trained to denoise a set of tokens with independent per-token noise levels. We apply …
Acdc: Autoregressive coherent multimodal generation using diffusion correction
Autoregressive models (ARMs) and diffusion models (DMs) represent two leading
paradigms in generative modeling, each excelling in distinct areas: ARMs in global context …
paradigms in generative modeling, each excelling in distinct areas: ARMs in global context …
From slow bidirectional to fast causal video generators
Current video diffusion models achieve impressive generation quality but struggle in
interactive applications due to bidirectional attention dependencies. The generation of a …
interactive applications due to bidirectional attention dependencies. The generation of a …
Closed-loop diffusion control of complex physical systems
The control problems of complex physical systems have broad applications in science and
engineering. Previous studies have shown that generative control methods based on …
engineering. Previous studies have shown that generative control methods based on …
On conditional diffusion models for PDE simulations
Modelling partial differential equations (PDEs) is of crucial importance in science and
engineering, and it includes tasks ranging from forecasting to inverse problems, such as …
engineering, and it includes tasks ranging from forecasting to inverse problems, such as …
ACDiT: Interpolating Autoregressive Conditional Modeling and Diffusion Transformer
The recent surge of interest in comprehensive multimodal models has necessitated the
unification of diverse modalities. However, the unification suffers from disparate …
unification of diverse modalities. However, the unification suffers from disparate …
One Diffusion to Generate Them All
We introduce OneDiffusion, a versatile, large-scale diffusion model that seamlessly supports
bidirectional image synthesis and understanding across diverse tasks. It enables conditional …
bidirectional image synthesis and understanding across diverse tasks. It enables conditional …
Music2Latent2: Audio Compression with Summary Embeddings and Autoregressive Decoding
Efficiently compressing high-dimensional audio signals into a compact and informative
latent space is crucial for various tasks, including generative modeling and music …
latent space is crucial for various tasks, including generative modeling and music …
Autoregressive Diffusion Transformer for Text-to-Speech Synthesis
Audio language models have recently emerged as a promising approach for various audio
generation tasks, relying on audio tokenizers to encode waveforms into sequences of …
generation tasks, relying on audio tokenizers to encode waveforms into sequences of …
Enhancing Multi-Text Long Video Generation Consistency without Tuning: Time-Frequency Analysis, Prompt Alignment, and Theory
Despite the considerable progress achieved in the long video generation problem, there is
still significant room to improve the consistency of the videos, particularly in terms of …
still significant room to improve the consistency of the videos, particularly in terms of …