Sv3d: Novel multi-view synthesis and 3d generation from a single image using latent video diffusion

V Voleti, CH Yao, M Boss, A Letts, D Pankratz… - … on Computer Vision, 2024 - Springer
Abstract We present Stable Video 3D (SV3D)—a latent video diffusion model for high-
resolution, image-to-multi-view generation of orbital videos around a 3D object. Recent …

Camco: Camera-controllable 3d-consistent image-to-video generation

D Xu, W Nie, C Liu, S Liu, J Kautz, Z Wang… - ar**
J Seo, K Fukuda, T Shibuya… - Advances in …, 2025 - proceedings.neurips.cc
Generating novel views from a single image remains a challenging task due to the
complexity of 3D scenes and the limited diversity in the existing multi-view datasets to train a …

Neural assets: 3d-aware multi-object scene synthesis with image diffusion models

Z Wu, Y Rubanova, R Kabra… - Advances in …, 2025 - proceedings.neurips.cc
We address the problem of multi-object 3D pose control in image diffusion models. Instead
of conditioning on a sequence of text tokens, we propose to use a set of per-object …

Vd3d: Taming large video diffusion transformers for 3d camera control

S Bahmani, I Skorokhodov, A Siarohin… - arxiv preprint arxiv …, 2024 - arxiv.org
Modern text-to-video synthesis models demonstrate coherent, photorealistic generation of
complex videos from a text description. However, most existing models lack fine-grained …

Llms meet multimodal generation and editing: A survey

Y He, Z Liu, J Chen, Z Tian, H Liu, X Chi, R Liu… - arxiv preprint arxiv …, 2024 - arxiv.org
With the recent advancement in large language models (LLMs), there is a growing interest in
combining LLMs with multimodal learning. Previous surveys of multimodal large language …

Cascade-zero123: One image to highly consistent 3d with self-prompted nearby views

Y Chen, J Fang, Y Huang, T Yi, X Zhang, L **e… - … on Computer Vision, 2024 - Springer
Synthesizing multi-view 3D from one single image is a significant but challenging task. Zero-
1-to-3 methods have achieved great success by lifting a 2D latent diffusion model to the 3D …