Selective listening by synchronizing speech with lips

Z Pan, R Tao, C Xu, H Li - IEEE/ACM Transactions on Audio …, 2022 - ieeexplore.ieee.org
A speaker extraction algorithm seeks to extract the speech of a target speaker from a multi-
talker speech mixture when given a cue that represents the target speaker, such as a pre …

LC-TTFS: Towards lossless network conversion for spiking neural networks with TTFS coding

Q Yang, M Zhang, J Wu, KC Tan… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
The biological neurons use precise spike times, in addition to the spike firing rate, to
communicate with each other. The time-to-first-spike (TTFS) coding is inspired by such …

MSRANet: Learning discriminative embeddings for speaker verification via channel and spatial attention mechanism in alterable scenarios

Q Zheng, Z Chen, H Liu, Y Lu, J Li, T Liu - Expert Systems with Applications, 2023 - Elsevier
Speaker embeddings have become the most popular feature representation in speaker
verification. Improving the robustness of speaker embedding extraction systems is a crucial …

L-spex: Localized target speaker extraction

M Ge, C Xu, L Wang, ES Chng… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org
Speaker extraction aims to extract the target speaker's voice from a multi-talker speech
mixture given an auxiliary reference utterance. Recent studies show that speaker extraction …

Speech separation with pretrained frontend to minimize domain mismatch

W Wang, Z Pan, X Li, S Wang… - IEEE/ACM Transactions on …, 2024 - ieeexplore.ieee.org
Speech separation seeks to separate individual speech signals from a speech mixture.
Typically, most separation models are trained on synthetic data due to the unavailability of …

Used: Universal speaker extraction and diarization

J Ao, MS Yıldırım, R Tao, M Ge, S Wang… - … on Audio, Speech …, 2024 - ieeexplore.ieee.org
Speaker extraction and diarization are two enabling techniques for real-world speech
applications. Speaker extraction aims to extract a target speaker's voice from a speech …

Speaker verification using attentive multi-scale convolutional recurrent network

Y Li, Z Jiang, W Cao, Q Huang - Applied Soft Computing, 2022 - Elsevier
In this paper, we propose a speaker verification method by an Attentive Multi-scale
Convolutional Recurrent Network (AMCRN). The proposed AMCRN can acquire both local …

Few-shot speaker identification using lightweight prototypical network with feature grou** and interaction

Y Li, H Chen, W Cao, Q Huang… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Existing methods for few-shot speaker identification (FSSI) obtain high accuracy, but their
computational complexities and model sizes need to be reduced for lightweight applications …

Aca-net: Towards lightweight speaker verification using asymmetric cross attention

JQ Yip, T Truong, D Ng, C Zhang, Y Ma… - arxiv preprint arxiv …, 2023 - arxiv.org
In this paper, we propose ACA-Net, a lightweight, global context-aware speaker embedding
extractor for Speaker Verification (SV) that improves upon existing work by using Asymmetric …

Improving curriculum learning for target speaker extraction with synthetic speakers

Y Liu, X Liu, J Yamagishi - 2024 IEEE Spoken Language …, 2024 - ieeexplore.ieee.org
Target speaker extraction (TSE) aims to isolate individual speaker voices from complex
speech environments. The effectiveness of TSE systems is often compromised when the …