- Academic Search

Overview of speaker modeling and its applications: From the lens of deep speaker representation learning

S Wang, Z Chen, KA Lee, Y Qian… - IEEE/ACM Transactions …, 2024 - ieeexplore.ieee.org

Speaker individuality information is among the most critical elements within speech signals.
By thoroughly and accurately modeling this information, it can be utilized in various …

Save Cite Cited by 4 Related articles All 3 versions Free GPT-4

[Free GPT-4]

[PDF] ieee.org

NeuroHeed: Neuro-steered speaker extraction using EEG signals

Z Pan, M Borsdorf, S Cai, T Schultz… - IEEE/ACM Transactions …, 2024 - ieeexplore.ieee.org

Humans possess the remarkable ability to selectively attend to a single speaker amidst
competing voices and background noise, known as selective auditory attention. Recent …

Save Cite Cited by 15 Related articles All 2 versions Free GPT-4

[Free GPT-4]

[PDF] arxiv.org

Enhancing code-switching speech recognition with interactive language biases

H Liu, LP Garcia, X Zhang, AWH Khong… - ICASSP 2024-2024 …, 2024 - ieeexplore.ieee.org

Languages usually switch within a multilingual speech signal, especially in a bilingual
society. This phenomenon is referred to as code-switching (CS), making automatic speech …

Save Cite Cited by 14 Related articles All 3 versions Free GPT-4

[Free GPT-4]

[PDF] arxiv.org

Used: Universal speaker extraction and diarization

J Ao, MS Yıldırım, R Tao, M Ge, S Wang… - … on Audio, Speech …, 2024 - ieeexplore.ieee.org

Speaker extraction and diarization are two enabling techniques for real-world speech
applications. Speaker extraction aims to extract a target speaker's voice from a speech …

Save Cite Cited by 5 Related articles All 2 versions Free GPT-4

Speech foundation model ensembles for the controlled singing voice deepfake detection (ctrsvdd) challenge 2024

A Guragain, T Liu, Z Pan, HB Sailor… - 2024 IEEE Spoken …, 2024 - ieeexplore.ieee.org

This work details our approach to achieving a leading system with a 1.79% pooled equal
error rate (EER) on the evaluation set of the Controlled Singing Voice Deepfake Detection …

Save Cite Cited by 3 Related articles All 2 versions Free GPT-4

[Free GPT-4]

[PDF] acm.org

Multi-stage Face-voice Association Learning with Keynote Speaker Diarization

R Tao, Z Shi, Y Jiang, DT Truong, ES Chng… - Proceedings of the …, 2024 - dl.acm.org

The human brain has the capability to associate the unknown person's voice and face by
leveraging their general relationship, referred to as" cross-modal speaker verification''. This …

Save Cite Cited by 2 Related articles All 2 versions Free GPT-4

[Free GPT-4]

[PDF] arxiv.org

Text-Queried Target Sound Event Localization

J Zhao, X Qian, Y Xu, H Liu, Y Cao… - 2024 32nd …, 2024 - ieeexplore.ieee.org

Sound event localization and detection (SELD) aims to determine the appearance of sound
classes, together with their Direction of Arrival (DOA). However, current SELD systems can …

Save Cite Cited by 1 Related articles All 3 versions Free GPT-4

[Free GPT-4]

[PDF] arxiv.org

DENSE: Dynamic Embedding Causal Target Speech Extraction

Y Wang, Z Yuan, X Wu - arxiv preprint arxiv:2409.06136, 2024 - arxiv.org

Target speech extraction (TSE) focuses on extracting the speech of a specific target speaker
from a mixture of signals. Existing TSE models typically utilize static embeddings as …

Save Cite Cited by 1 Related articles View as HTML

[Free GPT-4]

[PDF] arxiv.org

Target Speech Diarization with Multimodal Prompts

Y Jiang, R Tao, Z Chen, Y Qian, H Li - arxiv preprint arxiv:2406.07198, 2024 - arxiv.org

Traditional speaker diarization seeks to detect``who spoke when''according to speaker
characteristics. Extending to target speech diarization, we detect``when target event …

Save Cite Cited by 7 Related articles All 2 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] acm.org

A Synopsis of FAME 2024 Challenge: Associating Faces with Voices in Multilingual Environments

MS Saeed, S Nawaz, M Moscati, RK Das… - Proceedings of the …, 2024 - dl.acm.org

Over half of the world's population is bilingual and people often communicate under
multilingual scenarios. The Face-Voice Association in Multilingual Environments (FAME) …

Create alert

Cite

Advanced search

Saved to My library

Prompt-driven target speech diarization

Overview of speaker modeling and its applications: From the lens of deep speaker representation learning

NeuroHeed: Neuro-steered speaker extraction using EEG signals

Enhancing code-switching speech recognition with interactive language biases

Used: Universal speaker extraction and diarization

Speech foundation model ensembles for the controlled singing voice deepfake detection (ctrsvdd) challenge 2024

Multi-stage Face-voice Association Learning with Keynote Speaker Diarization

Text-Queried Target Sound Event Localization

DENSE: Dynamic Embedding Causal Target Speech Extraction

Target Speech Diarization with Multimodal Prompts

A Synopsis of FAME 2024 Challenge: Associating Faces with Voices in Multilingual Environments