Neural target speech extraction: An overview

K Zmolikova, M Delcroix, T Ochiai… - IEEE Signal …, 2023 - ieeexplore.ieee.org
Humans can listen to a target speaker even in challenging acoustic conditions that have
noise, reverberation, and interfering speakers. This phenomenon is known as the cocktail …

Uniaudio: An audio foundation model toward universal audio generation

D Yang, J Tian, X Tan, R Huang, S Liu, X Chang… - arxiv preprint arxiv …, 2023 - arxiv.org
Large Language models (LLM) have demonstrated the capability to handle a variety of
generative tasks. This paper presents the UniAudio system, which, unlike prior task-specific …

Speechx: Neural codec language model as a versatile speech transformer

X Wang, M Thakker, Z Chen, N Kanda… - … on Audio, Speech …, 2024 - ieeexplore.ieee.org
Recent advancements in generative speech models based on audio-text prompts have
enabled remarkable innovations like high-quality zero-shot text-to-speech. However …

Overview of speaker modeling and its applications: From the lens of deep speaker representation learning

S Wang, Z Chen, KA Lee, Y Qian… - IEEE/ACM Transactions …, 2024 - ieeexplore.ieee.org
Speaker individuality information is among the most critical elements within speech signals.
By thoroughly and accurately modeling this information, it can be utilized in various …

Uniaudio: Towards universal audio generation with large language models

D Yang, J Tian, X Tan, R Huang, S Liu… - … on Machine Learning, 2024 - openreview.net
Audio generation is a major branch of generative AI research. Compared with prior works in
this area that are commonly task-specific with heavy domain knowledge, this paper …

Spex+: A complete time domain speaker extraction network

M Ge, C Xu, L Wang, ES Chng, J Dang, H Li - arxiv preprint arxiv …, 2020 - arxiv.org
Speaker extraction aims to extract the target speech signal from a multi-talker environment
given a target speaker's reference speech. We recently proposed a time-domain solution …

Deep learning-based non-intrusive multi-objective speech assessment model with cross-domain features

RE Zezario, SW Fu, F Chen, CS Fuh… - … on Audio, Speech …, 2022 - ieeexplore.ieee.org
This study proposes a cross-domain multi-objective speech assessment model, called
MOSA-Net, which can simultaneously estimate the speech quality, intelligibility, and …

Speech enhancement using self-adaptation and multi-head self-attention

Y Koizumi, K Yatabe, M Delcroix… - ICASSP 2020-2020 …, 2020 - ieeexplore.ieee.org
This paper investigates a self-adaptation method for speech enhancement using auxiliary
speaker-aware features; we extract a speaker representation used for adaptation directly …

Far-field automatic speech recognition

R Haeb-Umbach, J Heymann, L Drude… - Proceedings of the …, 2020 - ieeexplore.ieee.org
The machine recognition of speech spoken at a distance from the microphones, known as
far-field automatic speech recognition (ASR), has received a significant increase in attention …

Look once to hear: Target speech hearing with noisy examples

B Veluri, M Itani, T Chen, T Yoshioka… - Proceedings of the 2024 …, 2024 - dl.acm.org
In crowded settings, the human brain can focus on speech from a target speaker, given prior
knowledge of how they sound. We introduce a novel intelligent hearable system that …