Drive-1-to-3: Enriching diffusion priors for novel view synthesis of real vehicles

C Lin, B Zhuang, S Sun, Z Jiang, J Cai… - arxiv preprint arxiv …, 2024 - arxiv.org
The recent advent of large-scale 3D data, eg Objaverse, has led to impressive progress in
training pose-conditioned diffusion models for novel view synthesis. However, due to the …

DisMix: Disentangling Mixtures of Musical Instruments for Source-level Pitch and Timbre Manipulation

YJ Luo, KW Cheuk, W Choi, T Uesaka… - arxiv preprint arxiv …, 2024 - arxiv.org
Existing work on pitch and timbre disentanglement has been mostly focused on single-
instrument music audio, excluding the cases where multiple instruments are presented. To …

Dream to Manipulate: Compositional World Models Empowering Robot Imitation Learning with Imagination

L Barcellona, A Zadaianchuk, D Allegro, S Papa… - arxiv preprint arxiv …, 2024 - arxiv.org
A world model provides an agent with a representation of its environment, enabling it to
predict the causal consequences of its actions. Current world models typically cannot …

MObI: Multimodal Object Inpainting Using Diffusion Models

A Buburuzan, A Sharma, J Redford, PK Dokania… - arxiv preprint arxiv …, 2025 - arxiv.org
Safety-critical applications, such as autonomous driving, require extensive multimodal data
for rigorous testing. Methods based on synthetic data are gaining prominence due to the …

Disentangling Multi-instrument Music Audio for Source-level Pitch and Timbre Manipulation

YJ Luo, KW Cheuk, W Choi, WH Liao, K Toyama… - … NeurIPS 2024 Workshop … - openreview.net
Disentangling pitch and timbre from the audio of a musical instrument involves encoding
these two attributes as separate latent representations, allowing the synthesis of instrument …