Vintage: Joint video and text conditioning for holistic audio generation
Recent advances in audio generation have focused on text-to-audio (T2A) and video-to-
audio (V2A) tasks. However, T2A or V2A methods cannot generate holistic sounds …
audio (V2A) tasks. However, T2A or V2A methods cannot generate holistic sounds …