Large spatial model: End-to-end unposed images to semantic 3d
Reconstructing and understanding 3D structures from a limited number of images is a
classical problem in computer vision. Traditional approaches typically decompose this task …
classical problem in computer vision. Traditional approaches typically decompose this task …
CoTracker3: Simpler and better point tracking by pseudo-labelling real videos
Most state-of-the-art point trackers are trained on synthetic data due to the difficulty of
annotating real videos for this task. However, this can result in suboptimal performance due …
annotating real videos for this task. However, this can result in suboptimal performance due …
Mvsplat360: Feed-forward 360 scene synthesis from sparse views
We introduce MVSplat360, a feed-forward approach for 360 {\deg} novel view synthesis
(NVS) of diverse real-world scenes, using only sparse observations. This setting is …
(NVS) of diverse real-world scenes, using only sparse observations. This setting is …
Animateanything: Consistent and controllable animation for video generation
We present a unified controllable video generation approach AnimateAnything that
facilitates precise and consistent video manipulation across various conditions, including …
facilitates precise and consistent video manipulation across various conditions, including …
Can Visual Foundation Models Achieve Long-term Point Tracking?
Large-scale vision foundation models have demonstrated remarkable success across
various tasks, underscoring their robust generalization capabilities. While their proficiency in …
various tasks, underscoring their robust generalization capabilities. While their proficiency in …
Megasam: Accurate, fast, and robust structure and motion from casual dynamic videos
We present a system that allows for accurate, fast, and robust estimation of camera
parameters and depth maps from casual monocular videos of dynamic scenes. Most …
parameters and depth maps from casual monocular videos of dynamic scenes. Most …
UniHOI: Learning Fast, Dense and Generalizable 4D Reconstruction for Egocentric Hand Object Interaction Videos
Egocentric Hand Object Interaction (HOI) videos provide valuable insights into human
interactions with the physical world, attracting growing interest from the computer vision and …
interactions with the physical world, attracting growing interest from the computer vision and …
Continuous 3D Perception Model with Persistent State
We present a unified framework capable of solving a broad range of 3D tasks. Our approach
features a stateful recurrent model that continuously updates its state representation with …
features a stateful recurrent model that continuously updates its state representation with …
Georecon: a coarse-to-fine visual 3D reconstruction approach for high-resolution images with neural matching priors
Visual 3D reconstruction enables rebuilding 3D scenes from captured images, serving as a
fundamental data source for digital earth modeling and intelligent cities. In the foundational …
fundamental data source for digital earth modeling and intelligent cities. In the foundational …
MegaSynth: Scaling Up 3D Scene Reconstruction with Synthesized Data
We propose scaling up 3D scene reconstruction by training with synthesized data. At the
core of our work is MegaSynth, a procedurally generated 3D dataset comprising 700K …
core of our work is MegaSynth, a procedurally generated 3D dataset comprising 700K …