Diffusion forcing: Next-token prediction meets full-sequence diffusion

B Chen, DM Monso, Y Du, M Simchowitz… - arxiv preprint arxiv …, 2024 - arxiv.org
This paper presents Diffusion Forcing, a new training paradigm where a diffusion model is
trained to denoise a set of tokens with independent per-token noise levels. We apply …

Acdc: Autoregressive coherent multimodal generation using diffusion correction

H Chung, D Lee, JC Ye - arxiv preprint arxiv:2410.04721, 2024 - arxiv.org
Autoregressive models (ARMs) and diffusion models (DMs) represent two leading
paradigms in generative modeling, each excelling in distinct areas: ARMs in global context …

From slow bidirectional to fast causal video generators

T Yin, Q Zhang, R Zhang, WT Freeman… - arxiv preprint arxiv …, 2024 - arxiv.org
Current video diffusion models achieve impressive generation quality but struggle in
interactive applications due to bidirectional attention dependencies. The generation of a …

Closed-loop diffusion control of complex physical systems

L Wei, H Feng, Y Yang, R Feng, P Hu, X Zheng… - arxiv preprint arxiv …, 2024 - arxiv.org
The control problems of complex physical systems have broad applications in science and
engineering. Previous studies have shown that generative control methods based on …

On conditional diffusion models for PDE simulations

A Shysheya, C Diaconu, F Bergamin… - arxiv preprint arxiv …, 2024 - arxiv.org
Modelling partial differential equations (PDEs) is of crucial importance in science and
engineering, and it includes tasks ranging from forecasting to inverse problems, such as …

ACDiT: Interpolating Autoregressive Conditional Modeling and Diffusion Transformer

J Hu, S Hu, Y Song, Y Huang, M Wang, H Zhou… - arxiv preprint arxiv …, 2024 - arxiv.org
The recent surge of interest in comprehensive multimodal models has necessitated the
unification of diverse modalities. However, the unification suffers from disparate …

One Diffusion to Generate Them All

DH Le, T Pham, S Lee, C Clark, A Kembhavi… - arxiv preprint arxiv …, 2024 - arxiv.org
We introduce OneDiffusion, a versatile, large-scale diffusion model that seamlessly supports
bidirectional image synthesis and understanding across diverse tasks. It enables conditional …

Music2Latent2: Audio Compression with Summary Embeddings and Autoregressive Decoding

M Pasini, S Lattner, G Fazekas - arxiv preprint arxiv:2501.17578, 2025 - arxiv.org
Efficiently compressing high-dimensional audio signals into a compact and informative
latent space is crucial for various tasks, including generative modeling and music …

Autoregressive Diffusion Transformer for Text-to-Speech Synthesis

Z Liu, S Wang, S Inoue, Q Bai, H Li - arxiv preprint arxiv:2406.05551, 2024 - arxiv.org
Audio language models have recently emerged as a promising approach for various audio
generation tasks, relying on audio tokenizers to encode waveforms into sequences of …

Enhancing Multi-Text Long Video Generation Consistency without Tuning: Time-Frequency Analysis, Prompt Alignment, and Theory

X Li, F Zhang, J Pan, Y Hou, VYF Tan… - arxiv preprint arxiv …, 2024 - arxiv.org
Despite the considerable progress achieved in the long video generation problem, there is
still significant room to improve the consistency of the videos, particularly in terms of …