3d gaussian splatting: Survey, technologies, challenges, and opportunities

Y Bao, T Ding, J Huo, Y Liu, Y Li, W Li… - IEEE Transactions on …, 2025‏ - ieeexplore.ieee.org
3D Gaussian Splatting (3DGS) has emerged as a prominent technique with the potential to
become a mainstream method for 3D representations. It can effectively transform multi-view …

Segment3d: Learning fine-grained class-agnostic 3d segmentation without manual labels

R Huang, S Peng, A Takmaz, F Tombari… - … on Computer Vision, 2024‏ - Springer
Current 3D scene segmentation methods are heavily dependent on manually annotated 3D
training datasets. Such manual annotations are labor-intensive, and often lack fine-grained …

P2P-Bridge: Diffusion Bridges for 3D Point Cloud Denoising

M Vogel, K Tateno, M Pollefeys, F Tombari… - … on Computer Vision, 2024‏ - Springer
In this work, we address the task of point cloud denoising using a novel framework adapting
Diffusion Schrödinger bridges to unstructured data like point sets. Unlike previous works that …

Track4Gen: Teaching Video Diffusion Models to Track Points Improves Video Generation

H Jeong, CHP Huang, JC Ye, N Mitra… - arxiv preprint arxiv …, 2024‏ - arxiv.org
While recent foundational video generators produce visually rich output, they still struggle
with appearance drift, where objects gradually degrade or change inconsistently across …

Desire-gs: 4d street gaussians for static-dynamic decomposition and surface reconstruction for urban driving scenes

C Peng, C Zhang, Y Wang, C Xu, Y **e… - arxiv preprint arxiv …, 2024‏ - arxiv.org
We present DeSiRe-GS, a self-supervised gaussian splatting representation, enabling
effective static-dynamic decomposition and high-fidelity surface reconstruction in complex …

Feat2GS: Probing Visual Foundation Models with Gaussian Splatting

Y Chen, X Chen, A Chen, G Pons-Moll… - arxiv preprint arxiv …, 2024‏ - arxiv.org
Given that visual foundation models (VFMs) are trained on extensive datasets but often
limited to 2D images, a natural question arises: how well do they understand the 3D world …

DepthCues: Evaluating Monocular Depth Perception in Large Vision Models

D Danier, M Aygün, C Li, H Bilen… - arxiv preprint arxiv …, 2024‏ - arxiv.org
Large-scale pre-trained vision models are becoming increasingly prevalent, offering
expressive and generalizable visual representations that benefit various downstream tasks …

Multiview Equivariance Improves 3D Correspondence Understanding with Minimal Feature Finetuning

Y You, Y Li, C Deng, Y Wang, L Guibas - arxiv preprint arxiv:2411.19458, 2024‏ - arxiv.org
Vision foundation models, particularly the ViT family, have revolutionized image
understanding by providing rich semantic features. However, despite their success in 2D …

On Unifying Video Generation and Camera Pose Estimation

CHP Huang, JS Yoon, H Jeong, N Mitra… - arxiv preprint arxiv …, 2025‏ - arxiv.org
Inspired by the emergent 3D capabilities in image generators, we explore whether video
generators similarly exhibit 3D awareness. Using structure-from-motion (SfM) as a …

BevSplat: Resolving Height Ambiguity via Feature-Based Gaussian Primitives for Weakly-Supervised Cross-View Localization

Q Wang, S Wu, Y Shi - arxiv preprint arxiv:2502.09080, 2025‏ - arxiv.org
This paper addresses the problem of weakly supervised cross-view localization, where the
goal is to estimate the pose of a ground camera relative to a satellite image with noisy …