A survey on video diffusion models
The recent wave of AI-generated content (AIGC) has witnessed substantial success in
computer vision, with the diffusion model playing a crucial role in this achievement. Due to …
computer vision, with the diffusion model playing a crucial role in this achievement. Due to …
State of the art on diffusion models for visual computing
The field of visual computing is rapidly advancing due to the emergence of generative
artificial intelligence (AI), which unlocks unprecedented capabilities for the generation …
artificial intelligence (AI), which unlocks unprecedented capabilities for the generation …
Videocrafter2: Overcoming data limitations for high-quality video diffusion models
Text-to-video generation aims to produce a video based on a given prompt. Recently
several commercial video models have been able to generate plausible videos with minimal …
several commercial video models have been able to generate plausible videos with minimal …
Pix2video: Video editing using image diffusion
Image diffusion models, trained on massive image collections, have emerged as the most
versatile image generator model in terms of quality and diversity. They support inverting real …
versatile image generator model in terms of quality and diversity. They support inverting real …
Stablevideo: Text-driven consistency-aware diffusion video editing
Diffusion-based methods can generate realistic images and videos, but they struggle to edit
existing objects in a video while preserving their geometry over time. This prevents diffusion …
existing objects in a video while preserving their geometry over time. This prevents diffusion …
Photorealistic video generation with diffusion models
We present WALT, a diffusion transformer for photorealistic video generation from text
prompts. Our approach has two key design decisions. First, we use a causal encoder to …
prompts. Our approach has two key design decisions. First, we use a causal encoder to …
Videopoet: A large language model for zero-shot video generation
We present VideoPoet, a language model capable of synthesizing high-quality video, with
matching audio, from a large variety of conditioning signals. VideoPoet employs a decoder …
matching audio, from a large variety of conditioning signals. VideoPoet employs a decoder …
Latte: Latent diffusion transformer for video generation
We propose a novel Latent Diffusion Transformer, namely Latte, for video generation. Latte
first extracts spatio-temporal tokens from input videos and then adopts a series of …
first extracts spatio-temporal tokens from input videos and then adopts a series of …
Latent video diffusion models for high-fidelity long video generation
AI-generated content has attracted lots of attention recently, but photo-realistic video
synthesis is still challenging. Although many attempts using GANs and autoregressive …
synthesis is still challenging. Although many attempts using GANs and autoregressive …
Motiondirector: Motion customization of text-to-video diffusion models
Large-scale pre-trained diffusion models have exhibited remarkable capabilities in diverse
video generations. Given a set of video clips of the same motion concept, the task of Motion …
video generations. Given a set of video clips of the same motion concept, the task of Motion …