SceneScript: Reconstructing Scenes with an Autoregressive Structured Language Model
We introduce SceneScript, a method that directly produces full scene models as a sequence
of structured language commands using an autoregressive, token-based approach. Our …
of structured language commands using an autoregressive, token-based approach. Our …
Snap: Self-supervised neural maps for visual positioning and semantic understanding
Semantic 2D maps are commonly used by humans and machines for navigation purposes,
whether it's walking or driving. However, these maps have limitations: they lack detail, often …
whether it's walking or driving. However, these maps have limitations: they lack detail, often …
Pixel-aligned recurrent queries for multi-view 3d object detection
We present PARQ-a multi-view 3D object detector with transformer and pixel-aligned
recurrent queries. Unlike previous works that use learnable features or only encode 3D point …
recurrent queries. Unlike previous works that use learnable features or only encode 3D point …
3D vision with transformers: A survey
The success of the transformer architecture in natural language processing has recently
triggered attention in the computer vision field. The transformer has been used as a …
triggered attention in the computer vision field. The transformer has been used as a …
Cad-estate: Large-scale cad model annotation in rgb videos
We propose a method for annotating videos of complex multi-object scenes with a globally-
consistent 3D representation of the objects. We annotate each object with a CAD model from …
consistent 3D representation of the objects. We annotate each object with a CAD model from …
RGB oralscan video-based orthodontic treatment monitoring
Y Tian, H Fu, H Wang, Y Liu, Z Xu, H Chen, J Li… - Science China …, 2024 - Springer
Orthodontic treatment monitoring involves using current images and previous 3D models to
estimate the relative position of individual teeth before and after orthodontic treatment. This …
estimate the relative position of individual teeth before and after orthodontic treatment. This …
OmniGlue: Generalizable Feature Matching with Foundation Model Guidance
The image matching field has been witnessing a continuous emergence of novel learnable
feature matching techniques with ever-improving performance on conventional benchmarks …
feature matching techniques with ever-improving performance on conventional benchmarks …
Detection based part-level articulated object reconstruction from single RGBD image
Y Kawana, T Harada - Advances in Neural Information …, 2024 - proceedings.neurips.cc
We propose an end-to-end trainable, cross-category method for reconstructing multiple man-
made articulated objects from a single RGBD image, focusing on part-level shape …
made articulated objects from a single RGBD image, focusing on part-level shape …
LASA: Instance Reconstruction from Real Scans using A Large-scale Aligned Shape Annotation Dataset
Instance shape reconstruction from a 3D scene involves recovering the full geometries of
multiple objects at the semantic instance level. Many methods leverage data-driven learning …
multiple objects at the semantic instance level. Many methods leverage data-driven learning …
An analysis of precision: occlusion and perspective geometry's role in 6D pose estimation
Achieving precise 6 degrees of freedom (6D) pose estimation of rigid objects from color
images is a critical challenge with wide-ranging applications in robotics and close-contact …
images is a critical challenge with wide-ranging applications in robotics and close-contact …