High-Quality Visually-Guided Sound Separation from Diverse Categories

C Huang, S Liang, Y Tian… - Proceedings of the …, 2024 - openaccess.thecvf.com
We propose DAVIS, a Diffusion-based Audio-VIusal Separation framework that solves the
audio-visual sound source separation task through generative learning. Existing methods …

AVE Speech Dataset: A Comprehensive Benchmark for Multi-Modal Speech Recognition Integrating Audio, Visual, and Electromyographic Signals

D Zhou, Y Zhang, J Wu, X Zhang, L **e… - arxiv preprint arxiv …, 2025 - arxiv.org
The global aging population faces considerable challenges, particularly in communication,
due to the prevalence of hearing and speech impairments. To address these, we introduce …

Diffusion-based Unsupervised Audio-visual Speech Enhancement

JE Ayilo, M Sadeghi, R Serizel… - arxiv preprint arxiv …, 2024 - arxiv.org
This paper proposes a new unsupervised audiovisual speech enhancement (AVSE)
approach that combines a diffusion-based audio-visual speech generative model with a non …

[PDF][PDF] Multi-Model Dual-Transformer Network for Audio-Visual Speech Enhancement

FE Wahab, N Saleem, A Hussain, R Ullah… - 3rd COG-MHEAR …, 2024 - isca-archive.org
Visual features offer important cues that can be used in noisy backgrounds. Audio-visual
speech enhancement (AVSE) improves speech quality and intelligibility by combining audio …