Self-supervised audio-visual soundscape stylization

T Li, R Wang, PY Huang, A Owens… - … on Computer Vision, 2024 - Springer
Speech sounds convey a great deal of information about the scenes, resulting in a variety of
effects ranging from reverberation to additional ambient sounds. In this paper, we …

Personalized percepnet: Real-time, low-complexity target voice separation and enhancement

R Giri, S Venkataramani, JM Valin, U Isik… - arxiv preprint arxiv …, 2021 - arxiv.org
The presence of multiple talkers in the surrounding environment poses a difficult challenge
for real-time speech communication systems considering the constraints on network size …

[PDF][PDF] Universal Speaker Extraction in the Presence and Absence of Target Speakers for Speech of One and Two Talkers.

M Borsdorf, C Xu, H Li, T Schultz - Interspeech, 2021 - isca-archive.org
Speaker extraction has been studied mostly for the scenarios where a target speaker is
present in a two or more talkers mixture. Such scenarios do not adequately reflect everyday …

X-sepformer: End-to-end speaker extraction network with explicit optimization on speaker confusion

K Liu, Z Du, X Wan, H Zhou - ICASSP 2023-2023 IEEE …, 2023 - ieeexplore.ieee.org
Target speech extraction (TSE) systems are designed to extract target speech from a multi-
talker mixture. The popular training objective for most prior TSE networks is to enhance …

Single-channel blind source separation of spatial aliasing signal based on stacked-LSTM

M Zhao, X Yao, J Wang, Y Yan, X Gao, Y Fan - Sensors, 2021 - mdpi.com
Aiming at the problem of insufficient separation accuracy of aliased signals in space Internet
satellite-ground communication scenarios, a stacked long short-term memory network …

End-to-end Online Speaker Diarization with Target Speaker Tracking

W Wang, M Li - arxiv preprint arxiv:2310.08696, 2023 - arxiv.org
This paper proposes an online target speaker voice activity detection system for speaker
diarization tasks, which does not require a priori knowledge from the clustering-based …

Attentive training: A new training framework for speech enhancement

A Pandey, DL Wang - IEEE/ACM transactions on audio, speech …, 2023 - ieeexplore.ieee.org
Dealing with speech interference in a speech enhancement system requires either speaker
separation or target speaker extraction. Speaker separation has multiple output streams with …

[PDF][PDF] SEF-Net: Speaker Embedding Free Target Speaker Extraction Network

B Zeng, H Suo, Y Wan, M Li - Proc. Interspeech, 2023 - isca-archive.org
Most target speaker extraction methods use the target speaker embedding as reference
information. However, the speaker embedding extracted by a speaker recognition module …

Quantitative evidence on overlooked aspects of enrollment speaker embeddings for target speaker separation

X Liu, X Li, J Serrà - ICASSP 2023-2023 IEEE International …, 2023 - ieeexplore.ieee.org
Single channel target speaker separation (TSS) aims at extracting a speaker's voice from a
mixture of multiple talkers given an enrollment utterance of that speaker. A typical deep …

Usef-tse: Universal speaker embedding free target speaker extraction

B Zeng, M Li - arxiv preprint arxiv:2409.02615, 2024 - arxiv.org
Target speaker extraction aims to isolate the voice of a specific speaker from mixed speech.
Traditionally, this process has relied on extracting a speaker embedding from a reference …