Videocomposer: Compositional video synthesis with motion controllability
The pursuit of controllability as a higher standard of visual content creation has yielded
remarkable progress in customizable image synthesis. However, achieving controllable …
remarkable progress in customizable image synthesis. However, achieving controllable …
Mvsplat: Efficient 3d gaussian splatting from sparse multi-view images
We introduce MVSplat, an efficient model that, given sparse multi-view images as input,
predicts clean feed-forward 3D Gaussians. To accurately localize the Gaussian centers, we …
predicts clean feed-forward 3D Gaussians. To accurately localize the Gaussian centers, we …
Rerender a video: Zero-shot text-guided video-to-video translation
Large text-to-image diffusion models have exhibited impressive proficiency in generating
high-quality images. However, when applying these models to video domain, ensuring …
high-quality images. However, when applying these models to video domain, ensuring …
Unifying flow, stereo and depth estimation
We present a unified formulation and model for three motion and 3D perception tasks:
optical flow, rectified stereo matching and unrectified stereo depth estimation from posed …
optical flow, rectified stereo matching and unrectified stereo depth estimation from posed …
Flowformer++: Masked cost volume autoencoding for pretraining optical flow estimation
FlowFormer introduces a transformer architecture into optical flow estimation and achieves
state-of-the-art performance. The core component of FlowFormer is the transformer-based …
state-of-the-art performance. The core component of FlowFormer is the transformer-based …
Tapir: Tracking any point with per-frame initialization and temporal refinement
We present a novel model for Tracking Any Point (TAP) that effectively tracks any queried
point on any physical surface throughout a video sequence. Our approach employs two …
point on any physical surface throughout a video sequence. Our approach employs two …
Nicer-slam: Neural implicit scene encoding for rgb slam
Neural implicit representations have recently become popular in simultaneous localization
and map** (SLAM), especially in dense visual SLAM. However, existing works either rely …
and map** (SLAM), especially in dense visual SLAM. However, existing works either rely …
A dynamic multi-scale voxel flow network for video prediction
The performance of video prediction has been greatly boosted by advanced deep neural
networks. However, most of the current methods suffer from large model sizes and require …
networks. However, most of the current methods suffer from large model sizes and require …
Dino-tracker: Taming dino for self-supervised point tracking in a single video
We present DINO-Tracker–a new framework for long-term dense tracking in video. The pillar
of our approach is combining test-time training on a single video, with the powerful localized …
of our approach is combining test-time training on a single video, with the powerful localized …
Videoflow: Exploiting temporal cues for multi-frame optical flow estimation
We introduce VideoFlow, a novel optical flow estimation framework for videos. In contrast to
previous methods that learn to estimate optical flow from two frames, VideoFlow concurrently …
previous methods that learn to estimate optical flow from two frames, VideoFlow concurrently …