Drive-1-to-3: Enriching diffusion priors for novel view synthesis of real vehicles
The recent advent of large-scale 3D data, eg Objaverse, has led to impressive progress in
training pose-conditioned diffusion models for novel view synthesis. However, due to the …
training pose-conditioned diffusion models for novel view synthesis. However, due to the …
DisMix: Disentangling Mixtures of Musical Instruments for Source-level Pitch and Timbre Manipulation
Existing work on pitch and timbre disentanglement has been mostly focused on single-
instrument music audio, excluding the cases where multiple instruments are presented. To …
instrument music audio, excluding the cases where multiple instruments are presented. To …
Dream to Manipulate: Compositional World Models Empowering Robot Imitation Learning with Imagination
A world model provides an agent with a representation of its environment, enabling it to
predict the causal consequences of its actions. Current world models typically cannot …
predict the causal consequences of its actions. Current world models typically cannot …
MObI: Multimodal Object Inpainting Using Diffusion Models
Safety-critical applications, such as autonomous driving, require extensive multimodal data
for rigorous testing. Methods based on synthetic data are gaining prominence due to the …
for rigorous testing. Methods based on synthetic data are gaining prominence due to the …
Disentangling Multi-instrument Music Audio for Source-level Pitch and Timbre Manipulation
Disentangling pitch and timbre from the audio of a musical instrument involves encoding
these two attributes as separate latent representations, allowing the synthesis of instrument …
these two attributes as separate latent representations, allowing the synthesis of instrument …