- Academic Search

L Yang, B Kang, Z Huang, X Xu… - Proceedings of the …, 2024 - openaccess.thecvf.com

Abstract This work presents Depth Anything a highly practical solution for robust monocular
depth estimation. Without pursuing novel technical modules we aim to build a simple yet …

Simpan Kutip Dirujuk 585 kali Artikel terkait 6 versi Versi HTML

[Free GPT-4]

[PDF] arxiv.org

Cambrian-1: A fully open, vision-centric exploration of multimodal llms

S Tong, E Brown, P Wu, S Woo, M Middepogu… - arxiv preprint arxiv …, 2024 - arxiv.org

We introduce Cambrian-1, a family of multimodal LLMs (MLLMs) designed with a vision-
centric approach. While stronger language models can enhance multimodal capabilities, the …

Simpan Kutip Dirujuk 172 kali Artikel terkait 4 versi Versi HTML

[Free GPT-4]

[PDF] arxiv.org

Sapiens: Foundation for human vision models

R Khirodkar, T Bagautdinov, J Martinez… - … on Computer Vision, 2024 - Springer

We present Sapiens, a family of models for four fundamental human-centric vision tasks–2D
pose estimation, body-part segmentation, depth estimation, and surface normal prediction …

Simpan Kutip Dirujuk 30 kali Artikel terkait 3 versi

[Free GPT-4]

[HTML] acm.org

Rgb↔ x: Image decomposition and synthesis using material-and lighting-aware diffusion models

Z Zeng, V Deschaintre, I Georgiev… - ACM SIGGRAPH 2024 …, 2024 - dl.acm.org

The three areas of realistic forward rendering, per-pixel inverse rendering, and generative
image synthesis may seem like separate and unrelated sub-fields of graphics and vision …

Simpan Kutip Dirujuk 19 kali Artikel terkait 2 versi

[Free GPT-4]

[HTML] acm.org

A construct-optimize approach to sparse view synthesis without camera pose

K Jiang, Y Fu, M Varma T, Y Belhe, X Wang… - ACM SIGGRAPH 2024 …, 2024 - dl.acm.org

Novel view synthesis from a sparse set of input images is a challenging problem of great
practical interest, especially when camera poses are absent or inaccurate. Direct …

Simpan Kutip Dirujuk 10 kali Artikel terkait 2 versi

[Free GPT-4]

[PDF] arxiv.org

Scenewiz3d: Towards text-guided 3d scene composition

Q Zhang, C Wang, A Siarohin, P Zhuang, Y Xu… - arxiv preprint arxiv …, 2023 - arxiv.org

We are witnessing significant breakthroughs in the technology for generating 3D objects
from text. Existing approaches either leverage large text-to-image models to optimize a 3D …

Simpan Kutip Dirujuk 23 kali Artikel terkait 2 versi Versi HTML

[Free GPT-4]

[PDF] arxiv.org

Open-sora plan: Open-source large video generation model

B Lin, Y Ge, X Cheng, Z Li, B Zhu, S Wang, X He… - arxiv preprint arxiv …, 2024 - arxiv.org

We introduce Open-Sora Plan, an open-source project that aims to contribute a large
generation model for generating desired high-resolution videos with long durations based …

Simpan Kutip Dirujuk 9 kali Artikel terkait 2 versi Versi HTML

[Free GPT-4]

[PDF] arxiv.org

4k4dgen: Panoramic 4d generation at 4k resolution

R Li, P Pan, B Yang, D Xu, S Zhou, X Zhang… - arxiv preprint arxiv …, 2024 - arxiv.org

The blooming of virtual reality and augmented reality (VR/AR) technologies has driven an
increasing demand for the creation of high-quality, immersive, and dynamic environments …

Simpan Kutip Dirujuk 11 kali Artikel terkait Versi HTML

[Free GPT-4]

[PDF] thecvf.com

Lightit: Illumination modeling and control for diffusion models

P Kocsis, J Philip, K Sunkavalli… - Proceedings of the …, 2024 - openaccess.thecvf.com

We introduce LightIt a method for explicit illumination control for image generation. Recent
generative methods lack lighting control which is crucial to numerous artistic aspects of …

Simpan Kutip Dirujuk 11 kali Artikel terkait 3 versi Versi HTML

[Free GPT-4]

[PDF] arxiv.org

Learning 3D Geometry and Feature Consistent Gaussian Splatting for Object Removal

Y Wang, Q Wu, G Zhang, D Xu - European Conference on Computer …, 2024 - Springer

This paper tackles the intricate challenge of object removal to update the radiance field
using the 3D Gaussian Splatting. The main challenges of this task lie in the preservation of …

Simpan Kutip Dirujuk 5 kali Artikel terkait 2 versi

Buat notifikasi

Kutip

Penelusuran lanjutan

Disimpan ke Koleksi saya

Midas v3. 1--a model zoo for robust monocular relative depth estimation

Depth anything: Unleashing the power of large-scale unlabeled data

Cambrian-1: A fully open, vision-centric exploration of multimodal llms

Sapiens: Foundation for human vision models

Rgb↔ x: Image decomposition and synthesis using material-and lighting-aware diffusion models

A construct-optimize approach to sparse view synthesis without camera pose

Scenewiz3d: Towards text-guided 3d scene composition

Open-sora plan: Open-source large video generation model

4k4dgen: Panoramic 4d generation at 4k resolution

Lightit: Illumination modeling and control for diffusion models

Learning 3D Geometry and Feature Consistent Gaussian Splatting for Object Removal