Sora: A review on background, technology, limitations, and opportunities of large vision models
Sora is a text-to-video generative AI model, released by OpenAI in February 2024. The
model is trained to generate videos of realistic or imaginative scenes from text instructions …
model is trained to generate videos of realistic or imaginative scenes from text instructions …
State of the art on diffusion models for visual computing
The field of visual computing is rapidly advancing due to the emergence of generative
artificial intelligence (AI), which unlocks unprecedented capabilities for the generation …
artificial intelligence (AI), which unlocks unprecedented capabilities for the generation …
Preserve your own correlation: A noise prior for video diffusion models
Despite tremendous progress in generating high-quality images using diffusion models,
synthesizing a sequence of animated frames that are both photorealistic and temporally …
synthesizing a sequence of animated frames that are both photorealistic and temporally …
Dynamicrafter: Animating open-domain images with video diffusion priors
Animating a still image offers an engaging visual experience. Traditional image animation
techniques mainly focus on animating natural scenes with stochastic dynamics (eg clouds …
techniques mainly focus on animating natural scenes with stochastic dynamics (eg clouds …
Pix2video: Video editing using image diffusion
Image diffusion models, trained on massive image collections, have emerged as the most
versatile image generator model in terms of quality and diversity. They support inverting real …
versatile image generator model in terms of quality and diversity. They support inverting real …
Photorealistic video generation with diffusion models
We present WALT, a diffusion transformer for photorealistic video generation from text
prompts. Our approach has two key design decisions. First, we use a causal encoder to …
prompts. Our approach has two key design decisions. First, we use a causal encoder to …
Videopoet: A large language model for zero-shot video generation
We present VideoPoet, a language model capable of synthesizing high-quality video, with
matching audio, from a large variety of conditioning signals. VideoPoet employs a decoder …
matching audio, from a large variety of conditioning signals. VideoPoet employs a decoder …
Stablevideo: Text-driven consistency-aware diffusion video editing
Diffusion-based methods can generate realistic images and videos, but they struggle to edit
existing objects in a video while preserving their geometry over time. This prevents diffusion …
existing objects in a video while preserving their geometry over time. This prevents diffusion …
Sparsectrl: Adding sparse controls to text-to-video diffusion models
The development of text-to-video (T2V), ie, generating videos with a given text prompt, has
been significantly advanced in recent years. However, relying solely on text prompts often …
been significantly advanced in recent years. However, relying solely on text prompts often …
Vbench: Comprehensive benchmark suite for video generative models
Video generation has witnessed significant advancements yet evaluating these models
remains a challenge. A comprehensive evaluation benchmark for video generation is …
remains a challenge. A comprehensive evaluation benchmark for video generation is …