Restoring speaking lips from occlusion for audio-visual speech recognition

J Wang, Z Pan, M Zhang, RT Tan, H Li - Proceedings of the AAAI …, 2024 - ojs.aaai.org
Prior studies on audio-visual speech recognition typically assume the visibility of speaking
lips, ignoring the fact that visual occlusion occurs in real-world videos, thus adversely …

NeuroHeed: Neuro-steered speaker extraction using EEG signals

Z Pan, M Borsdorf, S Cai, T Schultz… - IEEE/ACM Transactions …, 2024 - ieeexplore.ieee.org
Humans possess the remarkable ability to selectively attend to a single speaker amidst
competing voices and background noise, known as selective auditory attention. Recent …

Av-sepformer: Cross-attention sepformer for audio-visual target speaker extraction

J Lin, X Cai, H Dinkel, J Chen, Z Yan… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org
Visual information can serve as an effective cue for target speaker extraction (TSE) and is
vital to improving extraction performance. In this paper, we propose AV-SepFormer, a …

MSFNet: Multi-scale fusion network for brain-controlled speaker extraction

C Fan, J Zhang, H Zhang, W **ang, J Tao, X Li… - Proceedings of the …, 2024 - dl.acm.org
Speaker extraction aims to selectively extract the target speaker from the multi-talker
environment under the guidance of auxiliary reference. Recent studies have shown that the …

Speaker extraction with co-speech gestures cue

Z Pan, X Qian, H Li - IEEE Signal Processing Letters, 2022 - ieeexplore.ieee.org
Speaker extraction seeks to extract the clean speech of a target speaker from a multi-talker
mixture speech. There have been studies to use a pre-recorded speech sample or face …

Time-domain speech separation networks with graph encoding auxiliary

T Wang, Z Pan, M Ge, Z Yang… - IEEE Signal Processing …, 2023 - ieeexplore.ieee.org
End-to-end time-domain speech separation with masking strategy has shown its
performance advantage, where a 1-D convolutional layer is used as the speech encoder to …

NeuroHeed+: Improving neuro-steered speaker extraction with joint auditory attention detection

Z Pan, G Wichern, FG Germain… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org
Neuro-steered speaker extraction aims to extract the listener's brainattended speech signal
from a multi-talker speech signal, in which the attention is derived from the cortical activity …

Rethinking the visual cues in audio-visual speaker extraction

J Li, M Ge, R Cao, L Wang, J Dang, S Zhang - arxiv preprint arxiv …, 2023 - arxiv.org
The Audio-Visual Speaker Extraction (AVSE) algorithm employs parallel video recording to
leverage two visual cues, namely speaker identity and synchronization, to enhance …

Used: Universal speaker extraction and diarization

J Ao, MS Yıldırım, R Tao, M Ge, S Wang… - … on Audio, Speech …, 2024 - ieeexplore.ieee.org
Speaker extraction and diarization are two enabling techniques for real-world speech
applications. Speaker extraction aims to extract a target speaker's voice from a speech …

New insights on target speaker extraction

M Elminshawi, W Mack, SR Chetupalli… - arxiv preprint arxiv …, 2022 - arxiv.org
Speaker extraction (SE) aims to segregate the speech of a target speaker from a mixture of
interfering speakers with the help of auxiliary information. Several forms of auxiliary …