Depth anything: Unleashing the power of large-scale unlabeled data
Abstract This work presents Depth Anything a highly practical solution for robust monocular
depth estimation. Without pursuing novel technical modules we aim to build a simple yet …
depth estimation. Without pursuing novel technical modules we aim to build a simple yet …
Cambrian-1: A fully open, vision-centric exploration of multimodal llms
We introduce Cambrian-1, a family of multimodal LLMs (MLLMs) designed with a vision-
centric approach. While stronger language models can enhance multimodal capabilities, the …
centric approach. While stronger language models can enhance multimodal capabilities, the …
Sapiens: Foundation for human vision models
We present Sapiens, a family of models for four fundamental human-centric vision tasks–2D
pose estimation, body-part segmentation, depth estimation, and surface normal prediction …
pose estimation, body-part segmentation, depth estimation, and surface normal prediction …
Rgb↔ x: Image decomposition and synthesis using material-and lighting-aware diffusion models
The three areas of realistic forward rendering, per-pixel inverse rendering, and generative
image synthesis may seem like separate and unrelated sub-fields of graphics and vision …
image synthesis may seem like separate and unrelated sub-fields of graphics and vision …
A construct-optimize approach to sparse view synthesis without camera pose
Novel view synthesis from a sparse set of input images is a challenging problem of great
practical interest, especially when camera poses are absent or inaccurate. Direct …
practical interest, especially when camera poses are absent or inaccurate. Direct …
Scenewiz3d: Towards text-guided 3d scene composition
We are witnessing significant breakthroughs in the technology for generating 3D objects
from text. Existing approaches either leverage large text-to-image models to optimize a 3D …
from text. Existing approaches either leverage large text-to-image models to optimize a 3D …
Open-sora plan: Open-source large video generation model
We introduce Open-Sora Plan, an open-source project that aims to contribute a large
generation model for generating desired high-resolution videos with long durations based …
generation model for generating desired high-resolution videos with long durations based …
4k4dgen: Panoramic 4d generation at 4k resolution
The blooming of virtual reality and augmented reality (VR/AR) technologies has driven an
increasing demand for the creation of high-quality, immersive, and dynamic environments …
increasing demand for the creation of high-quality, immersive, and dynamic environments …
Lightit: Illumination modeling and control for diffusion models
We introduce LightIt a method for explicit illumination control for image generation. Recent
generative methods lack lighting control which is crucial to numerous artistic aspects of …
generative methods lack lighting control which is crucial to numerous artistic aspects of …
Learning 3D Geometry and Feature Consistent Gaussian Splatting for Object Removal
This paper tackles the intricate challenge of object removal to update the radiance field
using the 3D Gaussian Splatting. The main challenges of this task lie in the preservation of …
using the 3D Gaussian Splatting. The main challenges of this task lie in the preservation of …