NeuroHeed: Neuro-steered speaker extraction using EEG signals

Z Pan, M Borsdorf, S Cai, T Schultz… - IEEE/ACM Transactions …, 2024 - ieeexplore.ieee.org
Humans possess the remarkable ability to selectively attend to a single speaker amidst
competing voices and background noise, known as selective auditory attention. Recent …

Prompt-driven target speech diarization

Y Jiang, Z Chen, R Tao, L Deng… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org
We introduce a novel task named 'target speech diarization', which seeks to determine
'when target event occurred'within an audio signal. We devise a neural architecture called …

Multi-Level Speaker Representation for Target Speaker Extraction

K Zhang, J Li, S Wang, Y Wei, Y Wang, Y Wang… - arxiv preprint arxiv …, 2024 - arxiv.org
Target speaker extraction (TSE) relies on a reference cue of the target to extract the target
speech from a speech mixture. While a speaker embedding is commonly used as the …

Audio-visual target speaker extraction with reverse selective auditory attention

R Tao, X Qian, Y Jiang, J Li, J Wang, H Li - arxiv preprint arxiv …, 2024 - arxiv.org
Audio-visual target speaker extraction (AV-TSE) aims to extract the specific person's speech
from the audio mixture given auxiliary visual cues. Previous methods usually search for the …

Audio-visual active speaker extraction for sparsely overlapped multi-talker speech

J Li, R Tao, Z Pan, M Ge, S Wang… - ICASSP 2024-2024 IEEE …, 2024 - ieeexplore.ieee.org
Target speaker extraction aims to extract the speech of a specific speaker from a multi-talker
mixture as specified by an auxiliary reference. Most studies focus on the scenario where the …

Scenario-aware audio-visual TF-Gridnet for target speech extraction

Z Pan, G Wichern, Y Masuyama… - 2023 IEEE Automatic …, 2023 - ieeexplore.ieee.org
Target speech extraction aims to extract, based on a given conditioning cue, a target speech
signal that is corrupted by interfering sources, such as noise or competing speakers …

On the effectiveness of enrollment speech augmentation for Target Speaker Extraction

J Li, K Zhang, S Wang, H Li, MW Mak… - 2024 IEEE Spoken …, 2024 - ieeexplore.ieee.org
Deep learning technologies have significantly advanced the performance of target speaker
extraction (TSE) tasks. To enhance the generalization and robustness of these algorithms …

Enhancing Speaker Extraction Through Rectifying Target Confusion

J Wang, S Wang, J Li, K Zhang… - 2024 IEEE Spoken …, 2024 - ieeexplore.ieee.org
Target Speaker Extraction (TSE) aims to extract target speech from mixed audio using clues
that identify the target speaker. However, TSE often faces the Target Confusion (TC) …

Target Speech Extraction with Pre-trained AV-HuBERT and Mask-And-Recover Strategy

W Wu, X Chen, X Wu, H Li… - 2024 International Joint …, 2024 - ieeexplore.ieee.org
Audio-visual target speech extraction (AV-TSE) is one of the enabling technologies in
robotics and many audiovisual applications. One of the challenges of AV-TSE is how to …

Audio-Visual Target Speaker Extraction with Selective Auditory Attention

R Tao, X Qian, Y Jiang, J Li, J Wang… - IEEE Transactions on …, 2025 - ieeexplore.ieee.org
Audio-visual target speaker extraction (AV-TSE) aims to extract the specific person's speech
from the audio mixture given auxiliary visual cues. Previous methods usually search for the …