NeuroHeed: Neuro-steered speaker extraction using EEG signals

Z Pan, M Borsdorf, S Cai, T Schultz… - IEEE/ACM Transactions …, 2024 - ieeexplore.ieee.org
Humans possess the remarkable ability to selectively attend to a single speaker amidst
competing voices and background noise, known as selective auditory attention. Recent …

Selective listening by synchronizing speech with lips

Z Pan, R Tao, C Xu, H Li - IEEE/ACM Transactions on Audio …, 2022 - ieeexplore.ieee.org
A speaker extraction algorithm seeks to extract the speech of a target speaker from a multi-
talker speech mixture when given a cue that represents the target speaker, such as a pre …

X-sepformer: End-to-end speaker extraction network with explicit optimization on speaker confusion

K Liu, Z Du, X Wan, H Zhou - ICASSP 2023-2023 IEEE …, 2023 - ieeexplore.ieee.org
Target speech extraction (TSE) systems are designed to extract target speech from a multi-
talker mixture. The popular training objective for most prior TSE networks is to enhance …

Target speaker extraction by directly exploiting contextual information in the time-frequency domain

X Yang, C Bao, J Zhou, X Chen - ICASSP 2024-2024 IEEE …, 2024 - ieeexplore.ieee.org
In target speaker extraction, many studies rely on the speaker embedding which is obtained
from an enrollment of the target speaker and employed as the guidance. However, solely …

[PDF][PDF] Sef-net: Speaker embedding free target speaker extraction network

B Zeng, S Hongbin, Y Wan, M Li - Proc. Interspeech, 2023 - isca-archive.org
Most target speaker extraction methods use the target speaker embedding as reference
information. However, the speaker embedding extracted by a speaker recognition module …

X-tf-gridnet: A time–frequency domain target speaker extraction network with adaptive speaker embedding fusion

F Hao, X Li, C Zheng - Information Fusion, 2024 - Elsevier
Target speaker extraction (TSE) which has the capability to directly extract desired speech
given enrollment utterances of the target speaker has attracted more and more attention for …

Self-supervised disentangled representation learning for robust target speech extraction

Z Mu, X Yang, S Sun, Q Yang - Proceedings of the AAAI Conference on …, 2024 - ojs.aaai.org
Speech signals are inherently complex as they encompass both global acoustic
characteristics and local semantic information. However, in the task of target speech …

Speech enhancement with fullband-subband cross-attention network

J Chen, W Rao, Z Wang, Z Wu, Y Wang, T Yu… - arxiv preprint arxiv …, 2022 - arxiv.org
FullSubNet has shown its promising performance on speech enhancement by utilizing both
fullband and subband information. However, the relationship between fullband and subband …

Target speaker extraction with ultra-short reference speech by ve-ve framework

L Yang, W Liu, L Tan, J Yang… - ICASSP 2023-2023 IEEE …, 2023 - ieeexplore.ieee.org
The goal of target speaker extraction (TSE) is to extract the target speaker's voice from the
mixture speech of multiple speakers. It needs to enroll the speech of the target speaker in …

Mc-spex: Towards effective speaker extraction with multi-scale interfusion and conditional speaker modulation

J Chen, W Rao, Z Wang, J Lin, Y Ju, S He… - arxiv preprint arxiv …, 2023 - arxiv.org
The previous SpEx+ has yielded outstanding performance in speaker extraction and
attracted much attention. However, it still encounters inadequate utilization of multi-scale …