Motion Prompting: Controlling Video Generation with Motion Trajectories
Motion control is crucial for generating expressive and compelling video content; however,
most existing video generation models rely mainly on text prompts for control, which struggle …
most existing video generation models rely mainly on text prompts for control, which struggle …
Track4Gen: Teaching Video Diffusion Models to Track Points Improves Video Generation
While recent foundational video generators produce visually rich output, they still struggle
with appearance drift, where objects gradually degrade or change inconsistently across …
with appearance drift, where objects gradually degrade or change inconsistently across …
DAViD: Modeling Dynamic Affordance of 3D Objects using Pre-trained Video Diffusion Models
H Kim, S Beak, H Joo - arxiv preprint arxiv:2501.08333, 2025 - arxiv.org
Understanding the ability of humans to use objects is crucial for AI to improve daily life.
Existing studies for learning such ability focus on human-object patterns (eg, contact, spatial …
Existing studies for learning such ability focus on human-object patterns (eg, contact, spatial …
InterDyn: Controllable Interactive Dynamics with Video Diffusion Models
Predicting the dynamics of interacting objects is essential for both humans and intelligent
systems. However, existing approaches are limited to simplified, toy settings and lack …
systems. However, existing approaches are limited to simplified, toy settings and lack …
FramePainter: Endowing Interactive Image Editing with Video Diffusion Priors
Interactive image editing allows users to modify images through visual interaction operations
such as drawing, clicking, and dragging. Existing methods construct such supervision …
such as drawing, clicking, and dragging. Existing methods construct such supervision …
VFX Creator: Animated Visual Effect Generation with Controllable Diffusion Transformer
Crafting magic and illusions is one of the most thrilling aspects of filmmaking, with visual
effects (VFX) serving as the powerhouse behind unforgettable cinematic experiences. While …
effects (VFX) serving as the powerhouse behind unforgettable cinematic experiences. While …
Track-On: Transformer-based Online Point Tracking with Memory
In this paper, we consider the problem of long-term point tracking, which requires consistent
identification of points across multiple frames in a video, despite changes in appearance …
identification of points across multiple frames in a video, despite changes in appearance …
TAPTRv3: Spatial and Temporal Context Foster Robust Tracking of Any Point in Long Video
In this paper, we present TAPTRv3, which is built upon TAPTRv2 to improve its point
tracking robustness in long videos. TAPTRv2 is a simple DETR-like framework that can …
tracking robustness in long videos. TAPTRv2 is a simple DETR-like framework that can …
Improving Vision-Language-Action Models via Chain-of-Affordance
Robot foundation models, particularly Vision-Language-Action (VLA) models, have
garnered significant attention for their ability to enhance robot policy learning, greatly …
garnered significant attention for their ability to enhance robot policy learning, greatly …
Exploring Temporally-Aware Features for Point Tracking
Point tracking in videos is a fundamental task with applications in robotics, video editing, and
more. While many vision tasks benefit from pre-trained feature backbones to improve …
more. While many vision tasks benefit from pre-trained feature backbones to improve …