Motion Prompting: Controlling Video Generation with Motion Trajectories

D Geng, C Herrmann, J Hur, F Cole, S Zhang… - arxiv preprint arxiv …, 2024 - arxiv.org
Motion control is crucial for generating expressive and compelling video content; however,
most existing video generation models rely mainly on text prompts for control, which struggle …

Track4Gen: Teaching Video Diffusion Models to Track Points Improves Video Generation

H Jeong, CHP Huang, JC Ye, N Mitra… - arxiv preprint arxiv …, 2024 - arxiv.org
While recent foundational video generators produce visually rich output, they still struggle
with appearance drift, where objects gradually degrade or change inconsistently across …

DAViD: Modeling Dynamic Affordance of 3D Objects using Pre-trained Video Diffusion Models

H Kim, S Beak, H Joo - arxiv preprint arxiv:2501.08333, 2025 - arxiv.org
Understanding the ability of humans to use objects is crucial for AI to improve daily life.
Existing studies for learning such ability focus on human-object patterns (eg, contact, spatial …

InterDyn: Controllable Interactive Dynamics with Video Diffusion Models

R Akkerman, H Feng, MJ Black, D Tzionas… - arxiv preprint arxiv …, 2024 - arxiv.org
Predicting the dynamics of interacting objects is essential for both humans and intelligent
systems. However, existing approaches are limited to simplified, toy settings and lack …

FramePainter: Endowing Interactive Image Editing with Video Diffusion Priors

Y Zhang, X Zhou, Y Zeng, H Xu, H Li, W Zuo - arxiv preprint arxiv …, 2025 - arxiv.org
Interactive image editing allows users to modify images through visual interaction operations
such as drawing, clicking, and dragging. Existing methods construct such supervision …

VFX Creator: Animated Visual Effect Generation with Controllable Diffusion Transformer

X Liu, A Zeng, W Xue, H Yang, W Luo, Q Liu… - arxiv preprint arxiv …, 2025 - arxiv.org
Crafting magic and illusions is one of the most thrilling aspects of filmmaking, with visual
effects (VFX) serving as the powerhouse behind unforgettable cinematic experiences. While …

Track-On: Transformer-based Online Point Tracking with Memory

G Aydemir, X Cai, W **e, F Güney - arxiv preprint arxiv:2501.18487, 2025 - arxiv.org
In this paper, we consider the problem of long-term point tracking, which requires consistent
identification of points across multiple frames in a video, despite changes in appearance …

TAPTRv3: Spatial and Temporal Context Foster Robust Tracking of Any Point in Long Video

J Qu, H Li, S Liu, T Ren, Z Zeng, L Zhang - arxiv preprint arxiv:2411.18671, 2024 - arxiv.org
In this paper, we present TAPTRv3, which is built upon TAPTRv2 to improve its point
tracking robustness in long videos. TAPTRv2 is a simple DETR-like framework that can …

Improving Vision-Language-Action Models via Chain-of-Affordance

J Li, Y Zhu, Z Tang, J Wen, M Zhu, X Liu, C Li… - arxiv preprint arxiv …, 2024 - arxiv.org
Robot foundation models, particularly Vision-Language-Action (VLA) models, have
garnered significant attention for their ability to enhance robot policy learning, greatly …

Exploring Temporally-Aware Features for Point Tracking

IH Kim, S Cho, J Huang, J Yi, JY Lee, S Kim - arxiv preprint arxiv …, 2025 - arxiv.org
Point tracking in videos is a fundamental task with applications in robotics, video editing, and
more. While many vision tasks benefit from pre-trained feature backbones to improve …