Motion-i2v: Consistent and controllable image-to-video generation with explicit motion modeling

X Shi, Z Huang, FY Wang, W Bian, D Li… - ACM SIGGRAPH 2024 …, 2024 - dl.acm.org
We introduce Motion-I2V, a novel framework for consistent and controllable text-guided
image-to-video generation (I2V). In contrast to previous methods that directly learn the …

Diffusion model-based video editing: A survey

W Sun, RC Tu, J Liao, D Tao - arxiv preprint arxiv:2407.07111, 2024 - arxiv.org
The rapid development of diffusion models (DMs) has significantly advanced image and
video applications, making" what you want is what you see" a reality. Among these, video …

Bootstap: Bootstrapped training for tracking-any-point

C Doersch, P Luc, Y Yang, D Gokay… - Proceedings of the …, 2024 - openaccess.thecvf.com
To endow models with greater understanding of physics and motion, it is useful to enable
them to perceive how solid surfaces move and deform in real scenes. This can be formalized …

Eto: Efficient transformer-based local feature matching by organizing multiple homography hypotheses

J Ni, G Zhang, G Li, Y Li, X Liu, Z Huang… - arxiv preprint arxiv …, 2024 - arxiv.org
We tackle the efficiency problem of learning local feature matching. Recent advancements
have given rise to purely CNN-based and transformer-based approaches, each augmented …

ZoLA: Zero-Shot Creative Long Animation Generation with Short Video Model

FY Wang, Z Huang, Q Ma, G Song, X Lu, W Bian… - … on Computer Vision, 2024 - Springer
Although video generation has made great progress in capacity and controllability and is
gaining increasing attention, currently available video generation models still make minimal …

GS-DiT: Advancing Video Generation with Pseudo 4D Gaussian Fields through Efficient Dense 3D Point Tracking

W Bian, Z Huang, X Shi, Y Li, FY Wang, H Li - arxiv preprint arxiv …, 2025 - arxiv.org
4D video control is essential in video generation as it enables the use of sophisticated lens
techniques, such as multi-camera shooting and dolly zoom, which are currently unsupported …

A Global Depth-Range-Free Multi-View Stereo Transformer Network with Pose Embedding

Y Dong, Y Li, Z Huang, W Bian, J Liu, H Bao… - arxiv preprint arxiv …, 2024 - arxiv.org
In this paper, we propose a novel multi-view stereo (MVS) framework that gets rid of the
depth range prior. Unlike recent prior-free MVS methods that work in a pair-wise manner …

Event-Based Tracking Any Point with Motion-Augmented Temporal Consistency

H Han, W Zhai, Y Cao, B Li, Z Zha - arxiv preprint arxiv:2412.01300, 2024 - arxiv.org
Tracking Any Point (TAP) plays a crucial role in motion analysis. Video-based approaches
rely on iterative local matching for tracking, but they assume linear motion during the blind …

EgoPoints: Advancing Point Tracking for Egocentric Videos

A Darkhalil, R Guerrier, AW Harley… - arxiv preprint arxiv …, 2024 - arxiv.org
We introduce EgoPoints, a benchmark for point tracking in egocentric videos. We annotate
4.7 K challenging tracks in egocentric sequences. Compared to the popular TAP-Vid-DAVIS …

Event-aided Dense and Continuous Point Tracking

Z Wan, J Luo, Y Dai, GH Lee - openreview.net
Recent point tracking methods have made great strides in recovering the trajectories of any
point (especially key points) in long video sequences associated with large motions …