SceneScript: Reconstructing Scenes with an Autoregressive Structured Language Model

A Avetisyan, C **e, H Howard-Jenkins, TY Yang… - … on Computer Vision, 2024 - Springer
We introduce SceneScript, a method that directly produces full scene models as a sequence
of structured language commands using an autoregressive, token-based approach. Our …

Snap: Self-supervised neural maps for visual positioning and semantic understanding

PE Sarlin, E Trulls, M Pollefeys… - Advances in Neural …, 2023 - proceedings.neurips.cc
Semantic 2D maps are commonly used by humans and machines for navigation purposes,
whether it's walking or driving. However, these maps have limitations: they lack detail, often …

Pixel-aligned recurrent queries for multi-view 3d object detection

Y **e, H Jiang, G Gkioxari… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
We present PARQ-a multi-view 3D object detector with transformer and pixel-aligned
recurrent queries. Unlike previous works that use learnable features or only encode 3D point …

3D vision with transformers: A survey

J Lahoud, J Cao, FS Khan, H Cholakkal… - arxiv preprint arxiv …, 2022 - arxiv.org
The success of the transformer architecture in natural language processing has recently
triggered attention in the computer vision field. The transformer has been used as a …

Cad-estate: Large-scale cad model annotation in rgb videos

KK Maninis, S Popov, M Nießner… - Proceedings of the …, 2023 - openaccess.thecvf.com
We propose a method for annotating videos of complex multi-object scenes with a globally-
consistent 3D representation of the objects. We annotate each object with a CAD model from …

RGB oralscan video-based orthodontic treatment monitoring

Y Tian, H Fu, H Wang, Y Liu, Z Xu, H Chen, J Li… - Science China …, 2024 - Springer
Orthodontic treatment monitoring involves using current images and previous 3D models to
estimate the relative position of individual teeth before and after orthodontic treatment. This …

OmniGlue: Generalizable Feature Matching with Foundation Model Guidance

H Jiang, A Karpur, B Cao, Q Huang… - Proceedings of the …, 2024 - openaccess.thecvf.com
The image matching field has been witnessing a continuous emergence of novel learnable
feature matching techniques with ever-improving performance on conventional benchmarks …

Detection based part-level articulated object reconstruction from single RGBD image

Y Kawana, T Harada - Advances in Neural Information …, 2024 - proceedings.neurips.cc
We propose an end-to-end trainable, cross-category method for reconstructing multiple man-
made articulated objects from a single RGBD image, focusing on part-level shape …

LASA: Instance Reconstruction from Real Scans using A Large-scale Aligned Shape Annotation Dataset

H Liu, C Ye, Y Nie, Y He, X Han - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
Instance shape reconstruction from a 3D scene involves recovering the full geometries of
multiple objects at the semantic instance level. Many methods leverage data-driven learning …

An analysis of precision: occlusion and perspective geometry's role in 6D pose estimation

J Choate, D Worth, S Nykl, C Taylor, B Borghetti… - Neural Computing and …, 2024 - Springer
Achieving precise 6 degrees of freedom (6D) pose estimation of rigid objects from color
images is a critical challenge with wide-ranging applications in robotics and close-contact …