Masked generative video-to-audio transformers with enhanced synchronicity
Abstract Video-to-audio (V2A) generation leverages visual-only video features to render
plausible sounds that match the scene. Importantly, the generated sound onsets should …
plausible sounds that match the scene. Importantly, the generated sound onsets should …
Foleycrafter: Bring silent videos to life with lifelike and synchronized sounds
We study Neural Foley, the automatic generation of high-quality sound effects synchronizing
with videos, enabling an immersive audio-visual experience. Despite its wide range of …
with videos, enabling an immersive audio-visual experience. Despite its wide range of …
Temporally aligned audio for video with autoregression
We introduce V-AURA, the first autoregressive model to achieve high temporal alignment
and relevance in video-to-audio generation. V-AURA uses a high-framerate visual feature …
and relevance in video-to-audio generation. V-AURA uses a high-framerate visual feature …
Foleygen: Visually-guided audio generation
Recent advancements in audio generation tasks, such as text-to-audio and text-to-music
generation, have been spurred by the evolution of deep learning models and large-scale …
generation, have been spurred by the evolution of deep learning models and large-scale …
Draw an audio: Leveraging multi-instruction for video-to-audio synthesis
Foley is a term commonly used in filmmaking, referring to the addition of daily sound effects
to silent films or videos to enhance the auditory experience. Video-to-Audio (V2A), as a …
to silent films or videos to enhance the auditory experience. Video-to-Audio (V2A), as a …
Video-guided foley sound generation with multimodal controls
Generating sound effects for videos often requires creating artistic sound effects that diverge
significantly from real-life sources and flexible control in the sound design. To address this …
significantly from real-life sources and flexible control in the sound design. To address this …
[HTML][HTML] Artificial Taste: Advances and Innovative Applications in Healthcare
L Wang, Y Li, Y Zhang, B Zheng - Applied Sciences, 2025 - mdpi.com
Background: Scientists have recently developed a technology that induces artificial taste
through electronic stimulation. However, scattered reports have made it difficult to …
through electronic stimulation. However, scattered reports have made it difficult to …
Taming multimodal joint training for high-quality video-to-audio synthesis
We propose to synthesize high-quality and synchronized audio, given video and optional
text conditions, using a novel multimodal joint training framework MMAudio. In contrast to …
text conditions, using a novel multimodal joint training framework MMAudio. In contrast to …
Gotta hear them all: Sound source aware vision to audio generation
Vision-to-audio (V2A) synthesis has broad applications in multimedia. Recent
advancements of V2A methods have made it possible to generate relevant audios from …
advancements of V2A methods have made it possible to generate relevant audios from …
Vintage: Joint video and text conditioning for holistic audio generation
Recent advances in audio generation have focused on text-to-audio (T2A) and video-to-
audio (V2A) tasks. However, T2A or V2A methods cannot generate holistic sounds …
audio (V2A) tasks. However, T2A or V2A methods cannot generate holistic sounds …