Google Academic

V Voleti, CH Yao, M Boss, A Letts, D Pankratz… - … on Computer Vision, 2024 - Springer

Abstract We present Stable Video 3D (SV3D)—a latent video diffusion model for high-
resolution, image-to-multi-view generation of orbital videos around a 3D object. Recent …

Salvați Citați Citat de 124 ori Articole cu conținut similar Toate cele 6 versiuni

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Instantmesh: Efficient 3d mesh generation from a single image with sparse-view large reconstruction models

J Xu, W Cheng, Y Gao, X Wang, S Gao… - ar** foundation 3D …

Salvați Citați Citat de 27 ori Articole cu conținut similar Toate cele 9 versiuni

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Camco: Camera-controllable 3d-consistent image-to-video generation

D Xu, W Nie, C Liu, S Liu, J Kautz, Z Wang… - ar**

J Seo, K Fukuda, T Shibuya… - Advances in …, 2025 - proceedings.neurips.cc

Generating novel views from a single image remains a challenging task due to the
complexity of 3D scenes and the limited diversity in the existing multi-view datasets to train a …

Salvați Citați Citat de 10 ori Articole cu conținut similar Toate cele 4 versiuni Afișare ca HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Neural assets: 3d-aware multi-object scene synthesis with image diffusion models

Z Wu, Y Rubanova, R Kabra… - Advances in …, 2025 - proceedings.neurips.cc

We address the problem of multi-object 3D pose control in image diffusion models. Instead
of conditioning on a sequence of text tokens, we propose to use a set of per-object …

Salvați Citați Citat de 8 ori Articole cu conținut similar Toate cele 7 versiuni Afișare ca HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Vd3d: Taming large video diffusion transformers for 3d camera control

S Bahmani, I Skorokhodov, A Siarohin… - arxiv preprint arxiv …, 2024 - arxiv.org

Modern text-to-video synthesis models demonstrate coherent, photorealistic generation of
complex videos from a text description. However, most existing models lack fine-grained …

Salvați Citați Citat de 24 ori Articole cu conținut similar Toate cele 3 versiuni Afișare ca HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Llms meet multimodal generation and editing: A survey

Y He, Z Liu, J Chen, Z Tian, H Liu, X Chi, R Liu… - arxiv preprint arxiv …, 2024 - arxiv.org

With the recent advancement in large language models (LLMs), there is a growing interest in
combining LLMs with multimodal learning. Previous surveys of multimodal large language …

Salvați Citați Citat de 18 ori Articole cu conținut similar Toate cele 3 versiuni Afișare ca HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Cascade-zero123: One image to highly consistent 3d with self-prompted nearby views

Y Chen, J Fang, Y Huang, T Yi, X Zhang, L **e… - … on Computer Vision, 2024 - Springer

Synthesizing multi-view 3D from one single image is a significant but challenging task. Zero-
1-to-3 methods have achieved great success by lifting a 2D latent diffusion model to the 3D …

Salvați Citați Citat de 10 ori Articole cu conținut similar Toate cele 7 versiuni

Creează alerta

Citați

Căutare avansată

Salvat în Bibliotecă

Spad: Spatially aware multi-view diffusers

Sv3d: Novel multi-view synthesis and 3d generation from a single image using latent video diffusion

Instantmesh: Efficient 3d mesh generation from a single image with sparse-view large reconstruction models

Camco: Camera-controllable 3d-consistent image-to-video generation

Neural assets: 3d-aware multi-object scene synthesis with image diffusion models

Vd3d: Taming large video diffusion transformers for 3d camera control

Llms meet multimodal generation and editing: A survey

Cascade-zero123: One image to highly consistent 3d with self-prompted nearby views