Depth anything: Unleashing the power of large-scale unlabeled data

L Yang, B Kang, Z Huang, X Xu… - Proceedings of the …, 2024 - openaccess.thecvf.com
Abstract This work presents Depth Anything a highly practical solution for robust monocular
depth estimation. Without pursuing novel technical modules we aim to build a simple yet …

Cambrian-1: A fully open, vision-centric exploration of multimodal llms

S Tong, E Brown, P Wu, S Woo, M Middepogu… - arxiv preprint arxiv …, 2024 - arxiv.org
We introduce Cambrian-1, a family of multimodal LLMs (MLLMs) designed with a vision-
centric approach. While stronger language models can enhance multimodal capabilities, the …

Sapiens: Foundation for human vision models

R Khirodkar, T Bagautdinov, J Martinez… - … on Computer Vision, 2024 - Springer
We present Sapiens, a family of models for four fundamental human-centric vision tasks–2D
pose estimation, body-part segmentation, depth estimation, and surface normal prediction …

Rgb↔ x: Image decomposition and synthesis using material-and lighting-aware diffusion models

Z Zeng, V Deschaintre, I Georgiev… - ACM SIGGRAPH 2024 …, 2024 - dl.acm.org
The three areas of realistic forward rendering, per-pixel inverse rendering, and generative
image synthesis may seem like separate and unrelated sub-fields of graphics and vision …

A construct-optimize approach to sparse view synthesis without camera pose

K Jiang, Y Fu, M Varma T, Y Belhe, X Wang… - ACM SIGGRAPH 2024 …, 2024 - dl.acm.org
Novel view synthesis from a sparse set of input images is a challenging problem of great
practical interest, especially when camera poses are absent or inaccurate. Direct …

Scenewiz3d: Towards text-guided 3d scene composition

Q Zhang, C Wang, A Siarohin, P Zhuang, Y Xu… - arxiv preprint arxiv …, 2023 - arxiv.org
We are witnessing significant breakthroughs in the technology for generating 3D objects
from text. Existing approaches either leverage large text-to-image models to optimize a 3D …

Open-sora plan: Open-source large video generation model

B Lin, Y Ge, X Cheng, Z Li, B Zhu, S Wang, X He… - arxiv preprint arxiv …, 2024 - arxiv.org
We introduce Open-Sora Plan, an open-source project that aims to contribute a large
generation model for generating desired high-resolution videos with long durations based …

4k4dgen: Panoramic 4d generation at 4k resolution

R Li, P Pan, B Yang, D Xu, S Zhou, X Zhang… - arxiv preprint arxiv …, 2024 - arxiv.org
The blooming of virtual reality and augmented reality (VR/AR) technologies has driven an
increasing demand for the creation of high-quality, immersive, and dynamic environments …

Lightit: Illumination modeling and control for diffusion models

P Kocsis, J Philip, K Sunkavalli… - Proceedings of the …, 2024 - openaccess.thecvf.com
We introduce LightIt a method for explicit illumination control for image generation. Recent
generative methods lack lighting control which is crucial to numerous artistic aspects of …

Learning 3D Geometry and Feature Consistent Gaussian Splatting for Object Removal

Y Wang, Q Wu, G Zhang, D Xu - European Conference on Computer …, 2024 - Springer
This paper tackles the intricate challenge of object removal to update the radiance field
using the 3D Gaussian Splatting. The main challenges of this task lie in the preservation of …