Make-a-video: Text-to-video generation without text-video data
We propose Make-A-Video--an approach for directly translating the tremendous recent
progress in Text-to-Image (T2I) generation to Text-to-Video (T2V). Our intuition is simple …
progress in Text-to-Image (T2I) generation to Text-to-Video (T2V). Our intuition is simple …
Extracting motion and appearance via inter-frame attention for efficient video frame interpolation
Effectively extracting inter-frame motion and appearance information is important for video
frame interpolation (VFI). Previous works either extract both types of information in a mixed …
frame interpolation (VFI). Previous works either extract both types of information in a mixed …
Consistent view synthesis with pose-guided diffusion models
Novel view synthesis from a single image has been a cornerstone problem for many Virtual
Reality applications that provide immersive experiences. However, most existing techniques …
Reality applications that provide immersive experiences. However, most existing techniques …
Tryondiffusion: A tale of two unets
Given two images depicting a person and a garment worn by another person, our goal is to
generate a visualization of how the garment might look on the input person. A key challenge …
generate a visualization of how the garment might look on the input person. A key challenge …
Tooncrafter: Generative cartoon interpolation
We introduce ToonCrafter, a novel approach that transcends traditional correspondence-
based cartoon video interpolation, paving the way for generative interpolation. Traditional …
based cartoon video interpolation, paving the way for generative interpolation. Traditional …
Amt: All-pairs multi-field transforms for efficient frame interpolation
Abstract We present All-Pairs Multi-Field Transforms (AMT), a new network architecture for
video frame interpolation. It is based on two essential designs. First, we build bidirectional …
video frame interpolation. It is based on two essential designs. First, we build bidirectional …
Videogen: A reference-guided latent diffusion approach for high definition text-to-video generation
In this paper, we present VideoGen, a text-to-video generation approach, which can
generate a high-definition video with high frame fidelity and strong temporal consistency …
generate a high-definition video with high frame fidelity and strong temporal consistency …
Shape-aware text-driven layered video editing
Temporal consistency is essential for video editing applications. Existing work on layered
representation of videos allows propagating edits consistently to each frame. These …
representation of videos allows propagating edits consistently to each frame. These …
Tell me what happened: Unifying text-guided video completion via multimodal masked video generation
Generating a video given the first several static frames is challenging as it anticipates
reasonable future frames with temporal coherence. Besides video prediction, the ability to …
reasonable future frames with temporal coherence. Besides video prediction, the ability to …
Towards scalable neural representation for diverse videos
Implicit neural representations (INR) have gained increasing attention in representing 3D
scenes and images, and have been recently applied to encode videos (eg, NeRV, E-NeRV) …
scenes and images, and have been recently applied to encode videos (eg, NeRV, E-NeRV) …