Overview of speaker modeling and its applications: From the lens of deep speaker representation learning

S Wang, Z Chen, KA Lee, Y Qian… - IEEE/ACM Transactions …, 2024 - ieeexplore.ieee.org
Speaker individuality information is among the most critical elements within speech signals.
By thoroughly and accurately modeling this information, it can be utilized in various …

NeuroHeed: Neuro-steered speaker extraction using EEG signals

Z Pan, M Borsdorf, S Cai, T Schultz… - IEEE/ACM Transactions …, 2024 - ieeexplore.ieee.org
Humans possess the remarkable ability to selectively attend to a single speaker amidst
competing voices and background noise, known as selective auditory attention. Recent …

Enhancing code-switching speech recognition with interactive language biases

H Liu, LP Garcia, X Zhang, AWH Khong… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org
Languages usually switch within a multilingual speech signal, especially in a bilingual
society. This phenomenon is referred to as code-switching (CS), making automatic speech …

Used: Universal speaker extraction and diarization

J Ao, MS Yıldırım, R Tao, M Ge, S Wang… - … on Audio, Speech …, 2024 - ieeexplore.ieee.org
Speaker extraction and diarization are two enabling techniques for real-world speech
applications. Speaker extraction aims to extract a target speaker's voice from a speech …

Speech foundation model ensembles for the controlled singing voice deepfake detection (ctrsvdd) challenge 2024

A Guragain, T Liu, Z Pan, HB Sailor… - 2024 IEEE Spoken …, 2024 - ieeexplore.ieee.org
This work details our approach to achieving a leading system with a 1.79% pooled equal
error rate (EER) on the evaluation set of the Controlled Singing Voice Deepfake Detection …

Multi-stage Face-voice Association Learning with Keynote Speaker Diarization

R Tao, Z Shi, Y Jiang, DT Truong, ES Chng… - Proceedings of the …, 2024 - dl.acm.org
The human brain has the capability to associate the unknown person's voice and face by
leveraging their general relationship, referred to as" cross-modal speaker verification''. This …

Text-Queried Target Sound Event Localization

J Zhao, X Qian, Y Xu, H Liu, Y Cao… - 2024 32nd …, 2024 - ieeexplore.ieee.org
Sound event localization and detection (SELD) aims to determine the appearance of sound
classes, together with their Direction of Arrival (DOA). However, current SELD systems can …

DENSE: Dynamic Embedding Causal Target Speech Extraction

Y Wang, Z Yuan, X Wu - arxiv preprint arxiv:2409.06136, 2024 - arxiv.org
Target speech extraction (TSE) focuses on extracting the speech of a specific target speaker
from a mixture of signals. Existing TSE models typically utilize static embeddings as …

Target Speech Diarization with Multimodal Prompts

Y Jiang, R Tao, Z Chen, Y Qian, H Li - arxiv preprint arxiv:2406.07198, 2024 - arxiv.org
Traditional speaker diarization seeks to detect``who spoke when''according to speaker
characteristics. Extending to target speech diarization, we detect``when target event …

A Synopsis of FAME 2024 Challenge: Associating Faces with Voices in Multilingual Environments

MS Saeed, S Nawaz, M Moscati, RK Das… - Proceedings of the …, 2024 - dl.acm.org
Over half of the world's population is bilingual and people often communicate under
multilingual scenarios. The Face-Voice Association in Multilingual Environments (FAME) …