Selective listening by synchronizing speech with lips
A speaker extraction algorithm seeks to extract the speech of a target speaker from a multi-
talker speech mixture when given a cue that represents the target speaker, such as a pre …
talker speech mixture when given a cue that represents the target speaker, such as a pre …
LC-TTFS: Towards lossless network conversion for spiking neural networks with TTFS coding
The biological neurons use precise spike times, in addition to the spike firing rate, to
communicate with each other. The time-to-first-spike (TTFS) coding is inspired by such …
communicate with each other. The time-to-first-spike (TTFS) coding is inspired by such …
MSRANet: Learning discriminative embeddings for speaker verification via channel and spatial attention mechanism in alterable scenarios
Speaker embeddings have become the most popular feature representation in speaker
verification. Improving the robustness of speaker embedding extraction systems is a crucial …
verification. Improving the robustness of speaker embedding extraction systems is a crucial …
L-spex: Localized target speaker extraction
Speaker extraction aims to extract the target speaker's voice from a multi-talker speech
mixture given an auxiliary reference utterance. Recent studies show that speaker extraction …
mixture given an auxiliary reference utterance. Recent studies show that speaker extraction …
Speech separation with pretrained frontend to minimize domain mismatch
Speech separation seeks to separate individual speech signals from a speech mixture.
Typically, most separation models are trained on synthetic data due to the unavailability of …
Typically, most separation models are trained on synthetic data due to the unavailability of …
Used: Universal speaker extraction and diarization
Speaker extraction and diarization are two enabling techniques for real-world speech
applications. Speaker extraction aims to extract a target speaker's voice from a speech …
applications. Speaker extraction aims to extract a target speaker's voice from a speech …
Speaker verification using attentive multi-scale convolutional recurrent network
Y Li, Z Jiang, W Cao, Q Huang - Applied Soft Computing, 2022 - Elsevier
In this paper, we propose a speaker verification method by an Attentive Multi-scale
Convolutional Recurrent Network (AMCRN). The proposed AMCRN can acquire both local …
Convolutional Recurrent Network (AMCRN). The proposed AMCRN can acquire both local …
Few-shot speaker identification using lightweight prototypical network with feature grou** and interaction
Y Li, H Chen, W Cao, Q Huang… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Existing methods for few-shot speaker identification (FSSI) obtain high accuracy, but their
computational complexities and model sizes need to be reduced for lightweight applications …
computational complexities and model sizes need to be reduced for lightweight applications …
Aca-net: Towards lightweight speaker verification using asymmetric cross attention
In this paper, we propose ACA-Net, a lightweight, global context-aware speaker embedding
extractor for Speaker Verification (SV) that improves upon existing work by using Asymmetric …
extractor for Speaker Verification (SV) that improves upon existing work by using Asymmetric …
Improving curriculum learning for target speaker extraction with synthetic speakers
Target speaker extraction (TSE) aims to isolate individual speaker voices from complex
speech environments. The effectiveness of TSE systems is often compromised when the …
speech environments. The effectiveness of TSE systems is often compromised when the …