Google 學術搜尋

C Huang, S Liang, Y Tian… - Proceedings of the …, 2024 - openaccess.thecvf.com

We propose DAVIS, a Diffusion-based Audio-VIusal Separation framework that solves the
audio-visual sound source separation task through generative learning. Existing methods …

儲存引用相關文章全部共 4 個版本 HTML 版

[Free GPT-4]

[PDF] arxiv.org

AVE Speech Dataset: A Comprehensive Benchmark for Multi-Modal Speech Recognition Integrating Audio, Visual, and Electromyographic Signals

D Zhou, Y Zhang, J Wu, X Zhang, L **e… - arxiv preprint arxiv …, 2025 - arxiv.org

The global aging population faces considerable challenges, particularly in communication,
due to the prevalence of hearing and speech impairments. To address these, we introduce …

儲存引用相關文章全部共 2 個版本 HTML 版

[Free GPT-4]

[PDF] arxiv.org

Diffusion-based Unsupervised Audio-visual Speech Enhancement

JE Ayilo, M Sadeghi, R Serizel… - arxiv preprint arxiv …, 2024 - arxiv.org

This paper proposes a new unsupervised audiovisual speech enhancement (AVSE)
approach that combines a diffusion-based audio-visual speech generative model with a non …

儲存引用相關文章全部共 4 個版本 HTML 版

[Free GPT-4]

[PDF] isca-archive.org

[PDF][PDF] Multi-Model Dual-Transformer Network for Audio-Visual Speech Enhancement

FE Wahab, N Saleem, A Hussain, R Ullah… - 3rd COG-MHEAR …, 2024 - isca-archive.org

Visual features offer important cues that can be used in noisy backgrounds. Audio-visual
speech enhancement (AVSE) improves speech quality and intelligibility by combining audio …

儲存引用被引用 1 次相關文章全部共 2 個版本 HTML 版

建立快訊

引用

進階搜尋

已儲存至「我的圖書館」

AV2WAV: Diffusion-Based Re-Synthesis from Continuous Self-Supervised Features for Audio-Visual...

High-Quality Visually-Guided Sound Separation from Diverse Categories

AVE Speech Dataset: A Comprehensive Benchmark for Multi-Modal Speech Recognition Integrating Audio, Visual, and Electromyographic Signals

Diffusion-based Unsupervised Audio-visual Speech Enhancement

[PDF][PDF] Multi-Model Dual-Transformer Network for Audio-Visual Speech Enhancement