TRAM: Global Trajectory and Motion of 3D Humans from in-the-wild Videos

Y Wang, Z Wang, L Liu, K Daniilidis - European Conference on Computer …, 2024‏ - Springer
We propose TRAM, a two-stage method to reconstruct a human's global trajectory and
motion from in-the-wild videos. TRAM robustifies SLAM to recover the camera motion in the …

From an Image to a Scene: Learning to Imagine the World from a Million 360° Videos

M Wallingford, A Bhattad, A Kusupati… - Advances in …, 2025‏ - proceedings.neurips.cc
Abstract Three-dimensional (3D) understanding of objects and scenes play a key role in
humans' ability to interact with the world and has been an active area of research in …

Megasam: Accurate, fast, and robust structure and motion from casual dynamic videos

Z Li, R Tucker, F Cole, Q Wang, L **, V Ye… - arxiv preprint arxiv …, 2024‏ - arxiv.org
We present a system that allows for accurate, fast, and robust estimation of camera
parameters and depth maps from casual monocular videos of dynamic scenes. Most …

Graph-Guided Scene Reconstruction from Images with 3D Gaussian Splatting

C Cheng, G Song, Y Yao, Q Zhou, G Zhang… - arxiv preprint arxiv …, 2025‏ - arxiv.org
This paper investigates an open research challenge of reconstructing high-quality, large 3D
open scenes from images. It is observed existing methods have various limitations, such as …

MASt3R-SfM: a Fully-Integrated Solution for Unconstrained Structure-from-Motion

B Duisterhof, L Zust, P Weinzaepfel, V Leroy… - arxiv preprint arxiv …, 2024‏ - arxiv.org
Structure-from-Motion (SfM), a task aiming at jointly recovering camera poses and 3D
geometry of a scene given a set of images, remains a hard problem with still many open …

Stereo4D: Learning How Things Move in 3D from Internet Stereo Videos

L **, R Tucker, Z Li, D Fouhey, N Snavely… - arxiv preprint arxiv …, 2024‏ - arxiv.org
Learning to understand dynamic 3D scenes from imagery is crucial for applications ranging
from robotics to scene reconstruction. Yet, unlike other problems where large-scale …

Feat2GS: Probing Visual Foundation Models with Gaussian Splatting

Y Chen, X Chen, A Chen, G Pons-Moll… - arxiv preprint arxiv …, 2024‏ - arxiv.org
Given that visual foundation models (VFMs) are trained on extensive datasets but often
limited to 2D images, a natural question arises: how well do they understand the 3D world …

Generative Multiview Relighting for 3D Reconstruction under Extreme Illumination Variation

H Alzayer, P Henzler, JT Barron, JB Huang… - arxiv preprint arxiv …, 2024‏ - arxiv.org
Reconstructing the geometry and appearance of objects from photographs taken in different
environments is difficult as the illumination and therefore the object appearance vary across …

D2S: Representing sparse descriptors and 3D coordinates for camera relocalization

BT Bui, HH Bui, DT Tran, JH Lee - IEEE Robotics and …, 2024‏ - ieeexplore.ieee.org
State-of-the-art visual localization methods mostly rely on complex procedures to match
local descriptors and 3D point clouds. However, these procedures can incur significant costs …

Exploring Matching Rates: From Keypoint Selection to Camera Relocalization

H Lin, C Long, Y Fei, Q **a, E Yin, B Yin… - Proceedings of the 32nd …, 2024‏ - dl.acm.org
Camera relocalization is a challenging task to estimate camera pose within a known scene,
with wide applications in the fields of Virtual Reality (VR), Augmented Reality (AR), robotics …