Sora: A review on background, technology, limitations, and opportunities of large vision models
Sora is a text-to-video generative AI model, released by OpenAI in February 2024. The
model is trained to generate videos of realistic or imaginative scenes from text instructions …
model is trained to generate videos of realistic or imaginative scenes from text instructions …
State of the art on diffusion models for visual computing
The field of visual computing is rapidly advancing due to the emergence of generative
artificial intelligence (AI), which unlocks unprecedented capabilities for the generation …
artificial intelligence (AI), which unlocks unprecedented capabilities for the generation …
Show-1: Marrying pixel and latent diffusion models for text-to-video generation
Significant advancements have been achieved in the realm of large-scale pre-trained text-to-
video Diffusion Models (VDMs). However, previous methods either rely solely on pixel …
video Diffusion Models (VDMs). However, previous methods either rely solely on pixel …
Videopoet: A large language model for zero-shot video generation
We present VideoPoet, a language model capable of synthesizing high-quality video, with
matching audio, from a large variety of conditioning signals. VideoPoet employs a decoder …
matching audio, from a large variety of conditioning signals. VideoPoet employs a decoder …
Fast high-resolution image synthesis with latent adversarial diffusion distillation
Diffusion models are the main driver of progress in image and video synthesis, but suffer
from slow inference speed. Distillation methods, like the recently introduced adversarial …
from slow inference speed. Distillation methods, like the recently introduced adversarial …
Sparsectrl: Adding sparse controls to text-to-video diffusion models
The development of text-to-video (T2V), ie, generating videos with a given text prompt, has
been significantly advanced in recent years. However, relying solely on text prompts often …
been significantly advanced in recent years. However, relying solely on text prompts often …
Vbench: Comprehensive benchmark suite for video generative models
Video generation has witnessed significant advancements yet evaluating these models
remains a challenge. A comprehensive evaluation benchmark for video generation is …
remains a challenge. A comprehensive evaluation benchmark for video generation is …
Physdreamer: Physics-based interaction with 3d objects via video generation
Realistic object interactions are crucial for creating immersive virtual experiences, yet
synthesizing realistic 3D object dynamics in response to novel interactions remains a …
synthesizing realistic 3D object dynamics in response to novel interactions remains a …
Dynamical regimes of diffusion models
We study generative diffusion models in the regime where both the data dimension and the
sample size are large, and the score function is trained optimally. Using statistical physics …
sample size are large, and the score function is trained optimally. Using statistical physics …
When does Sora show: The beginning of TAO to imaginative intelligence and scenarios engineering
During our discussion at workshops for writing “What Does ChatGPT Say: The DAO from
Algorithmic Intelligence to Linguistic Intelligence”[1], we had expected the next milestone for …
Algorithmic Intelligence to Linguistic Intelligence”[1], we had expected the next milestone for …