Make-a-video: Text-to-video generation without text-video data

U Singer, A Polyak, T Hayes, X Yin, J An… - arxiv preprint arxiv …, 2022 - arxiv.org
We propose Make-A-Video--an approach for directly translating the tremendous recent
progress in Text-to-Image (T2I) generation to Text-to-Video (T2V). Our intuition is simple …

Tryondiffusion: A tale of two unets

L Zhu, D Yang, T Zhu, F Reda… - Proceedings of the …, 2023 - openaccess.thecvf.com
Given two images depicting a person and a garment worn by another person, our goal is to
generate a visualization of how the garment might look on the input person. A key challenge …

Extracting motion and appearance via inter-frame attention for efficient video frame interpolation

G Zhang, Y Zhu, H Wang, Y Chen… - Proceedings of the …, 2023 - openaccess.thecvf.com
Effectively extracting inter-frame motion and appearance information is important for video
frame interpolation (VFI). Previous works either extract both types of information in a mixed …

Consistent view synthesis with pose-guided diffusion models

HY Tseng, Q Li, C Kim, S Alsisan… - Proceedings of the …, 2023 - openaccess.thecvf.com
Novel view synthesis from a single image has been a cornerstone problem for many Virtual
Reality applications that provide immersive experiences. However, most existing techniques …

Amt: All-pairs multi-field transforms for efficient frame interpolation

Z Li, ZL Zhu, LH Han, Q Hou… - Proceedings of the …, 2023 - openaccess.thecvf.com
Abstract We present All-Pairs Multi-Field Transforms (AMT), a new network architecture for
video frame interpolation. It is based on two essential designs. First, we build bidirectional …

Video interpolation with diffusion models

S Jain, D Watson, E Tabellion… - Proceedings of the …, 2024 - openaccess.thecvf.com
We present VIDIM a generative model for video interpolation which creates short videos
given a start and end frame. In order to achieve high fidelity and generate motions unseen in …

A vision chip with complementary pathways for open-world sensing

Z Yang, T Wang, Y Lin, Y Chen, H Zeng, J Pei, J Wang… - Nature, 2024 - nature.com
Image sensors face substantial challenges when dealing with dynamic, diverse and
unpredictable scenes in open-world applications. However, the development of image …

Tooncrafter: Generative cartoon interpolation

J **ng, H Liu, M **a, Y Zhang, X Wang, Y Shan… - ACM Transactions on …, 2024 - dl.acm.org
We introduce ToonCrafter, a novel approach that transcends traditional correspondence-
based cartoon video interpolation, paving the way for generative interpolation. Traditional …

Towards scalable neural representation for diverse videos

B He, X Yang, H Wang, Z Wu, H Chen… - Proceedings of the …, 2023 - openaccess.thecvf.com
Implicit neural representations (INR) have gained increasing attention in representing 3D
scenes and images, and have been recently applied to encode videos (eg, NeRV, E-NeRV) …

Videogen: A reference-guided latent diffusion approach for high definition text-to-video generation

X Li, W Chu, Y Wu, W Yuan, F Liu, Q Zhang, F Li… - arxiv preprint arxiv …, 2023 - arxiv.org
In this paper, we present VideoGen, a text-to-video generation approach, which can
generate a high-definition video with high frame fidelity and strong temporal consistency …