Self-supervised audio-visual soundscape stylization
Speech sounds convey a great deal of information about the scenes, resulting in a variety of
effects ranging from reverberation to additional ambient sounds. In this paper, we …
effects ranging from reverberation to additional ambient sounds. In this paper, we …
Personalized percepnet: Real-time, low-complexity target voice separation and enhancement
The presence of multiple talkers in the surrounding environment poses a difficult challenge
for real-time speech communication systems considering the constraints on network size …
for real-time speech communication systems considering the constraints on network size …
[PDF][PDF] Universal Speaker Extraction in the Presence and Absence of Target Speakers for Speech of One and Two Talkers.
Speaker extraction has been studied mostly for the scenarios where a target speaker is
present in a two or more talkers mixture. Such scenarios do not adequately reflect everyday …
present in a two or more talkers mixture. Such scenarios do not adequately reflect everyday …
X-sepformer: End-to-end speaker extraction network with explicit optimization on speaker confusion
K Liu, Z Du, X Wan, H Zhou - ICASSP 2023-2023 IEEE …, 2023 - ieeexplore.ieee.org
Target speech extraction (TSE) systems are designed to extract target speech from a multi-
talker mixture. The popular training objective for most prior TSE networks is to enhance …
talker mixture. The popular training objective for most prior TSE networks is to enhance …
Single-channel blind source separation of spatial aliasing signal based on stacked-LSTM
M Zhao, X Yao, J Wang, Y Yan, X Gao, Y Fan - Sensors, 2021 - mdpi.com
Aiming at the problem of insufficient separation accuracy of aliased signals in space Internet
satellite-ground communication scenarios, a stacked long short-term memory network …
satellite-ground communication scenarios, a stacked long short-term memory network …
End-to-end Online Speaker Diarization with Target Speaker Tracking
This paper proposes an online target speaker voice activity detection system for speaker
diarization tasks, which does not require a priori knowledge from the clustering-based …
diarization tasks, which does not require a priori knowledge from the clustering-based …
Attentive training: A new training framework for speech enhancement
Dealing with speech interference in a speech enhancement system requires either speaker
separation or target speaker extraction. Speaker separation has multiple output streams with …
separation or target speaker extraction. Speaker separation has multiple output streams with …
[PDF][PDF] SEF-Net: Speaker Embedding Free Target Speaker Extraction Network
Most target speaker extraction methods use the target speaker embedding as reference
information. However, the speaker embedding extracted by a speaker recognition module …
information. However, the speaker embedding extracted by a speaker recognition module …
Quantitative evidence on overlooked aspects of enrollment speaker embeddings for target speaker separation
Single channel target speaker separation (TSS) aims at extracting a speaker's voice from a
mixture of multiple talkers given an enrollment utterance of that speaker. A typical deep …
mixture of multiple talkers given an enrollment utterance of that speaker. A typical deep …
Usef-tse: Universal speaker embedding free target speaker extraction
B Zeng, M Li - arxiv preprint arxiv:2409.02615, 2024 - arxiv.org
Target speaker extraction aims to isolate the voice of a specific speaker from mixed speech.
Traditionally, this process has relied on extracting a speaker embedding from a reference …
Traditionally, this process has relied on extracting a speaker embedding from a reference …