Motion-i2v: Consistent and controllable image-to-video generation with explicit motion modeling

X Shi, Z Huang, FY Wang, W Bian, D Li… - ACM SIGGRAPH 2024 …, 2024 - dl.acm.org
We introduce Motion-I2V, a novel framework for consistent and controllable text-guided
image-to-video generation (I2V). In contrast to previous methods that directly learn the …

Consisti2v: Enhancing visual consistency for image-to-video generation

W Ren, H Yang, G Zhang, C Wei, X Du… - arxiv preprint arxiv …, 2024 - arxiv.org
Image-to-video (I2V) generation aims to use the initial frame (alongside a text prompt) to
create a video sequence. A grand challenge in I2V generation is to maintain visual …

Hallo: Hierarchical audio-driven visual synthesis for portrait image animation

M Xu, H Li, Q Su, H Shang, L Zhang, C Liu… - arxiv preprint arxiv …, 2024 - arxiv.org
The field of portrait image animation, driven by speech audio input, has experienced
significant advancements in the generation of realistic and dynamic portraits. This research …

AniClipart: Clipart animation with text-to-video priors

R Wu, W Su, K Ma, J Liao - International Journal of Computer Vision, 2024 - Springer
Clipart, a pre-made graphic art form, offers a convenient and efficient way of illustrating
visual content. Traditional workflows to convert static clipart images into motion sequences …

Vbench++: Comprehensive and versatile benchmark suite for video generative models

Z Huang, F Zhang, X Xu, Y He, J Yu, Z Dong… - arxiv preprint arxiv …, 2024 - arxiv.org
Video generation has witnessed significant advancements, yet evaluating these models
remains a challenge. A comprehensive evaluation benchmark for video generation is …

Mardini: Masked autoregressive diffusion for video generation at scale

H Liu, S Liu, Z Zhou, M Xu, Y **e, X Han… - arxiv preprint arxiv …, 2024 - arxiv.org
We introduce MarDini, a new family of video diffusion models that integrate the advantages
of masked auto-regression (MAR) into a unified diffusion model (DM) framework. Here, MAR …

Draw an audio: Leveraging multi-instruction for video-to-audio synthesis

Q Yang, B Mao, Z Wang, X Nie, P Gao, Y Guo… - arxiv preprint arxiv …, 2024 - arxiv.org
Foley is a term commonly used in filmmaking, referring to the addition of daily sound effects
to silent films or videos to enhance the auditory experience. Video-to-Audio (V2A), as a …

Videoelevator: Elevating video generation quality with versatile text-to-image diffusion models

Y Zhang, Y Wei, X Lin, Z Hui, P Ren, X **e, X Ji… - arxiv preprint arxiv …, 2024 - arxiv.org
Text-to-image diffusion models (T2I) have demonstrated unprecedented capabilities in
creating realistic and aesthetic images. On the contrary, text-to-video diffusion models (T2V) …

Atomovideo: High fidelity image-to-video generation

L Gong, Y Zhu, W Li, X Kang, B Wang, T Ge… - arxiv preprint arxiv …, 2024 - arxiv.org
Recently, video generation has achieved significant rapid development based on superior
text-to-image generation techniques. In this work, we propose a high fidelity framework for …

ObjCtrl-2.5 D: Training-free Object Control with Camera Poses

Z Wang, Y Lan, S Zhou, CC Loy - arxiv preprint arxiv:2412.07721, 2024 - arxiv.org
This study aims to achieve more precise and versatile object control in image-to-video (I2V)
generation. Current methods typically represent the spatial movement of target objects with …