Images that sound: Composing images and sounds on a single canvas

Z Chen, D Geng, A Owens - Advances in Neural …, 2025 - proceedings.neurips.cc
Spectrograms are 2D representations of sound that look very different from the images found
in our visual world. And natural images, when played as spectrograms, make unnatural …

Video-guided foley sound generation with multimodal controls

Z Chen, P Seetharaman, B Russell, O Nieto… - arxiv preprint arxiv …, 2024 - arxiv.org
Generating sound effects for videos often requires creating artistic sound effects that diverge
significantly from real-life sources and flexible control in the sound design. To address this …

Adaptive Perception for Unified Visual Multi-modal Object Tracking

X Hu, B Zhong, Q Liang, Z Mo, L Shi, Y Tai… - arxiv preprint arxiv …, 2025 - arxiv.org
Recently, many multi-modal trackers prioritize RGB as the dominant modality, treating other
modalities as auxiliary, and fine-tuning separately various multi-modal tasks. This imbalance …