Hallo2: Long-duration and high-resolution audio-driven portrait image animation

J Cui, H Li, Y Yao, H Zhu, H Shang, K Cheng… - arxiv preprint arxiv …, 2024 - arxiv.org
Recent advances in latent diffusion-based generative models for portrait image animation,
such as Hallo, have achieved impressive results in short-duration video synthesis. In this …

Dimensionx: Create any 3d and 4d scenes from a single image with controllable video diffusion

W Sun, S Chen, F Liu, Z Chen, Y Duan, J Zhang… - arxiv preprint arxiv …, 2024 - arxiv.org
In this paper, we introduce\textbf {DimensionX}, a framework designed to generate
photorealistic 3D and 4D scenes from just a single image with video diffusion. Our approach …

Identity-Preserving Text-to-Video Generation by Frequency Decomposition

S Yuan, J Huang, X He, Y Ge, Y Shi, L Chen… - arxiv preprint arxiv …, 2024 - arxiv.org
Identity-preserving text-to-video (IPT2V) generation aims to create high-fidelity videos with
consistent human identity. It is an important task in video generation but remains an open …

Sg-i2v: Self-guided trajectory control in image-to-video generation

K Namekata, S Bahmani, Z Wu, Y Kant… - arxiv preprint arxiv …, 2024 - arxiv.org
Methods for image-to-video generation have achieved impressive, photo-realistic quality.
However, adjusting specific elements in generated videos, such as object motion or camera …

Motion Prompting: Controlling Video Generation with Motion Trajectories

D Geng, C Herrmann, J Hur, F Cole, S Zhang… - arxiv preprint arxiv …, 2024 - arxiv.org
Motion control is crucial for generating expressive and compelling video content; however,
most existing video generation models rely mainly on text prompts for control, which struggle …

Boosting camera motion control for video diffusion transformers

SY Cheong, D Ceylan, A Mustafa, A Gilbert… - arxiv preprint arxiv …, 2024 - arxiv.org
Recent advancements in diffusion models have significantly enhanced the quality of video
generation. However, fine-grained control over camera pose remains a challenge. While U …

LeviTor: 3D Trajectory Oriented Image-to-Video Synthesis

H Wang, H Ouyang, Q Wang, W Wang… - arxiv preprint arxiv …, 2024 - arxiv.org
The intuitive nature of drag-based interaction has led to its growing adoption for controlling
object trajectories in image-to-video synthesis. Still, existing methods that perform dragging …

AC3D: Analyzing and Improving 3D Camera Control in Video Diffusion Transformers

S Bahmani, I Skorokhodov, G Qian, A Siarohin… - arxiv preprint arxiv …, 2024 - arxiv.org
Numerous works have recently integrated 3D camera control into foundational text-to-video
models, but the resulting camera control is often imprecise, and video generation quality …

CPA: Camera-pose-awareness Diffusion Transformer for Video Generation

Y Wang, J Zhang, P Jiang, H Zhang, J Chen… - arxiv preprint arxiv …, 2024 - arxiv.org
Despite the significant advancements made by Diffusion Transformer (DiT)-based methods
in video generation, there remains a notable gap with controllable camera pose …

SeedVR: Seeding Infinity in Diffusion Transformer Towards Generic Video Restoration

J Wang, Z Lin, M Wei, Y Zhao, C Yang, CC Loy… - arxiv preprint arxiv …, 2025 - arxiv.org
Video restoration poses non-trivial challenges in maintaining fidelity while recovering
temporally consistent details from unknown degradations in the wild. Despite recent …