Vd3d: Taming large video diffusion transformers for 3d camera control

S Bahmani, I Skorokhodov, A Siarohin… - arxiv preprint arxiv …, 2024 - arxiv.org
Modern text-to-video synthesis models demonstrate coherent, photorealistic generation of
complex videos from a text description. However, most existing models lack fine-grained …

Dreamitate: Real-world visuomotor policy learning via video generation

J Liang, R Liu, E Ozguroglu, S Sudhakar… - arxiv preprint arxiv …, 2024 - arxiv.org
A key challenge in manipulation is learning a policy that can robustly generalize to diverse
visual environments. A promising mechanism for learning robust policies is to leverage …

Recapture: Generative video camera controls for user-provided videos using masked video fine-tuning

DJ Zhang, R Paiss, S Zada, N Karnad… - arxiv preprint arxiv …, 2024 - arxiv.org
Recently, breakthroughs in video modeling have allowed for controllable camera trajectories
in generated videos. However, these methods cannot be directly applied to user-provided …

DirectL: Efficient Radiance Fields Rendering for 3D Light Field Displays

Z Yang, B Liu, Y Song, L Yi, Y **ong, Z Zhang… - ACM Transactions on …, 2024 - dl.acm.org
Autostereoscopic display technology, despite decades of development, has not achieved
extensive application, primarily due to the daunting challenge of three-dimensional (3D) …

Differentiable robot rendering

R Liu, A Canberk, S Song, C Vondrick - arxiv preprint arxiv:2410.13851, 2024 - arxiv.org
Vision foundation models trained on massive amounts of visual data have shown
unprecedented reasoning and planning skills in open-world settings. A key challenge in …

AC3D: Analyzing and Improving 3D Camera Control in Video Diffusion Transformers

S Bahmani, I Skorokhodov, G Qian, A Siarohin… - arxiv preprint arxiv …, 2024 - arxiv.org
Numerous works have recently integrated 3D camera control into foundational text-to-video
models, but the resulting camera control is often imprecise, and video generation quality …

SynCamMaster: Synchronizing Multi-Camera Video Generation from Diverse Viewpoints

J Bai, M **a, X Wang, Z Yuan, X Fu, Z Liu, H Hu… - arxiv preprint arxiv …, 2024 - arxiv.org
Recent advancements in video diffusion models have shown exceptional abilities in
simulating real-world dynamics and maintaining 3D consistency. This progress inspires us …

MOVIS: Enhancing Multi-Object Novel View Synthesis for Indoor Scenes

R Lu, Y Chen, J Ni, B Jia, Y Liu, D Wan, G Zeng… - arxiv preprint arxiv …, 2024 - arxiv.org
Repurposing pre-trained diffusion models has been proven to be effective for NVS.
However, these methods are mostly limited to a single object; directly applying such …

SpatialDreamer: Self-supervised Stereo Video Synthesis from Monocular Input

Z Lv, Y Long, C Huang, C Li, C Lv, H Ren… - arxiv preprint arxiv …, 2024 - arxiv.org
Stereo video synthesis from a monocular input is a demanding task in the fields of spatial
computing and virtual reality. The main challenges of this task lie on the insufficiency of high …

Pointmap-Conditioned Diffusion for Consistent Novel View Synthesis

TAQ Nguyen, N Piasco, L Roldão, M Bennehar… - arxiv preprint arxiv …, 2025 - arxiv.org
In this paper, we present PointmapDiffusion, a novel framework for single-image novel view
synthesis (NVS) that utilizes pre-trained 2D diffusion models. Our method is the first to …