Visual autoregressive modeling: Scalable image generation via next-scale prediction

K Tian, Y Jiang, Z Yuan, B Peng… - Advances in neural …, 2025 - proceedings.neurips.cc
Abstract We present Visual AutoRegressive modeling (VAR), a new generation paradigm
that redefines the autoregressive learning on images as coarse-to-fine" next-scale …

Simplified and generalized masked diffusion for discrete data

J Shi, K Han, Z Wang, A Doucet… - Advances in Neural …, 2025 - proceedings.neurips.cc
Masked (or absorbing) diffusion is actively explored as an alternative to autoregressive
models for generative modeling of discrete data. However, existing work in this area has …

Freelong: Training-free long video generation with spectralblend temporal attention

Y Lu, Y Liang, L Zhu, Y Yang - Advances in Neural …, 2025 - proceedings.neurips.cc
Video diffusion models have made substantial progress in various video generation
applications. However, training models for long video generation tasks require significant …

Vidu4d: Single generated video to high-fidelity 4d reconstruction with dynamic gaussian surfels

Y Wang, X Wang, Z Chen, Z Wang… - Advances in Neural …, 2025 - proceedings.neurips.cc
Video generative models are receiving particular attention given their ability to generate
realistic and imaginative frames. Besides, these models are also observed to exhibit strong …

T2vsafetybench: Evaluating the safety of text-to-video generative models

Y Miao, Y Zhu, L Yu, J Zhu, XS Gao… - Advances in Neural …, 2025 - proceedings.neurips.cc
The recent development of Sora leads to a new era in text-to-video (T2V) generation. Along
with this comes the rising concern about its safety risks. The generated videos may contain …

Mimicmotion: High-quality human motion video generation with confidence-aware pose guidance

Y Zhang, J Gu, LW Wang, H Wang, J Cheng… - arxiv preprint arxiv …, 2024 - arxiv.org
In recent years, generative artificial intelligence has achieved significant advancements in
the field of image generation, spawning a variety of applications. However, video generation …

Neural residual diffusion models for deep scalable vision generation

Z Ma, L Zhao, B Qi, B Zhou - Advances in Neural …, 2025 - proceedings.neurips.cc
The most advanced diffusion models have recently adopted increasingly deep stacked
networks (eg, U-Net or Transformer) to promote the generative emergence capabilities of …

Pandora: Towards general world model with natural language actions and video states

J **ang, G Liu, Y Gu, Q Gao, Y Ning, Y Zha… - arxiv preprint arxiv …, 2024 - arxiv.org
World models simulate future states of the world in response to different actions. They
facilitate interactive content creation and provides a foundation for grounded, long-horizon …

Infinity: Scaling bitwise autoregressive modeling for high-resolution image synthesis

J Han, J Liu, Y Jiang, B Yan, Y Zhang, Z Yuan… - arxiv preprint arxiv …, 2024 - arxiv.org
We present Infinity, a Bitwise Visual AutoRegressive Modeling capable of generating high-
resolution, photorealistic images following language instruction. Infinity redefines visual …

Od-vae: An omni-dimensional video compressor for improving latent video diffusion model

L Chen, Z Li, B Lin, B Zhu, Q Wang, S Yuan… - arxiv preprint arxiv …, 2024 - arxiv.org
Variational Autoencoder (VAE), compressing videos into latent representations, is a crucial
preceding component of Latent Video Diffusion Models (LVDMs). With the same …