Make-a-video: Text-to-video generation without text-video data

U Singer, A Polyak, T Hayes, X Yin, J An… - arxiv preprint arxiv …, 2022 - arxiv.org
We propose Make-A-Video--an approach for directly translating the tremendous recent
progress in Text-to-Image (T2I) generation to Text-to-Video (T2V). Our intuition is simple …

Extracting motion and appearance via inter-frame attention for efficient video frame interpolation

G Zhang, Y Zhu, H Wang, Y Chen… - Proceedings of the …, 2023 - openaccess.thecvf.com
Effectively extracting inter-frame motion and appearance information is important for video
frame interpolation (VFI). Previous works either extract both types of information in a mixed …

Consistent view synthesis with pose-guided diffusion models

HY Tseng, Q Li, C Kim, S Alsisan… - Proceedings of the …, 2023 - openaccess.thecvf.com
Novel view synthesis from a single image has been a cornerstone problem for many Virtual
Reality applications that provide immersive experiences. However, most existing techniques …

Tryondiffusion: A tale of two unets

L Zhu, D Yang, T Zhu, F Reda… - Proceedings of the …, 2023 - openaccess.thecvf.com
Given two images depicting a person and a garment worn by another person, our goal is to
generate a visualization of how the garment might look on the input person. A key challenge …

Tooncrafter: Generative cartoon interpolation

J **ng, H Liu, M **a, Y Zhang, X Wang, Y Shan… - ACM Transactions on …, 2024 - dl.acm.org
We introduce ToonCrafter, a novel approach that transcends traditional correspondence-
based cartoon video interpolation, paving the way for generative interpolation. Traditional …

Amt: All-pairs multi-field transforms for efficient frame interpolation

Z Li, ZL Zhu, LH Han, Q Hou… - Proceedings of the …, 2023 - openaccess.thecvf.com
Abstract We present All-Pairs Multi-Field Transforms (AMT), a new network architecture for
video frame interpolation. It is based on two essential designs. First, we build bidirectional …

Videogen: A reference-guided latent diffusion approach for high definition text-to-video generation

X Li, W Chu, Y Wu, W Yuan, F Liu, Q Zhang, F Li… - arxiv preprint arxiv …, 2023 - arxiv.org
In this paper, we present VideoGen, a text-to-video generation approach, which can
generate a high-definition video with high frame fidelity and strong temporal consistency …

Shape-aware text-driven layered video editing

YC Lee, JZG Jang, YT Chen, E Qiu… - Proceedings of the …, 2023 - openaccess.thecvf.com
Temporal consistency is essential for video editing applications. Existing work on layered
representation of videos allows propagating edits consistently to each frame. These …

Tell me what happened: Unifying text-guided video completion via multimodal masked video generation

TJ Fu, L Yu, N Zhang, CY Fu, JC Su… - Proceedings of the …, 2023 - openaccess.thecvf.com
Generating a video given the first several static frames is challenging as it anticipates
reasonable future frames with temporal coherence. Besides video prediction, the ability to …

Towards scalable neural representation for diverse videos

B He, X Yang, H Wang, Z Wu, H Chen… - Proceedings of the …, 2023 - openaccess.thecvf.com
Implicit neural representations (INR) have gained increasing attention in representing 3D
scenes and images, and have been recently applied to encode videos (eg, NeRV, E-NeRV) …