High-Quality Visually-Guided Sound Separation from Diverse Categories
We propose DAVIS, a Diffusion-based Audio-VIusal Separation framework that solves the
audio-visual sound source separation task through generative learning. Existing methods …
audio-visual sound source separation task through generative learning. Existing methods …
AVE Speech Dataset: A Comprehensive Benchmark for Multi-Modal Speech Recognition Integrating Audio, Visual, and Electromyographic Signals
D Zhou, Y Zhang, J Wu, X Zhang, L **e… - arxiv preprint arxiv …, 2025 - arxiv.org
The global aging population faces considerable challenges, particularly in communication,
due to the prevalence of hearing and speech impairments. To address these, we introduce …
due to the prevalence of hearing and speech impairments. To address these, we introduce …
Diffusion-based Unsupervised Audio-visual Speech Enhancement
This paper proposes a new unsupervised audiovisual speech enhancement (AVSE)
approach that combines a diffusion-based audio-visual speech generative model with a non …
approach that combines a diffusion-based audio-visual speech generative model with a non …
[PDF][PDF] Multi-Model Dual-Transformer Network for Audio-Visual Speech Enhancement
Visual features offer important cues that can be used in noisy backgrounds. Audio-visual
speech enhancement (AVSE) improves speech quality and intelligibility by combining audio …
speech enhancement (AVSE) improves speech quality and intelligibility by combining audio …