Google Академія

K Zmolikova, M Delcroix, T Ochiai… - IEEE Signal …, 2023 - ieeexplore.ieee.org

Humans can listen to a target speaker even in challenging acoustic conditions that have
noise, reverberation, and interfering speakers. This phenomenon is known as the cocktail …

Зберегти Послатися Цитовано в 90 джерелах Пов’язані статті Кількість версій: 5

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Uniaudio: An audio foundation model toward universal audio generation

D Yang, J Tian, X Tan, R Huang, S Liu, X Chang… - arxiv preprint arxiv …, 2023 - arxiv.org

Large Language models (LLM) have demonstrated the capability to handle a variety of
generative tasks. This paper presents the UniAudio system, which, unlike prior task-specific …

Зберегти Послатися Цитовано в 110 джерелах Пов’язані статті Кількість версій: 3 Показати у форматі HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Speechx: Neural codec language model as a versatile speech transformer

X Wang, M Thakker, Z Chen, N Kanda… - … on Audio, Speech …, 2024 - ieeexplore.ieee.org

Recent advancements in generative speech models based on audio-text prompts have
enabled remarkable innovations like high-quality zero-shot text-to-speech. However …

Зберегти Послатися Цитовано в 69 джерелах Пов’язані статті Кількість версій: 5

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Overview of speaker modeling and its applications: From the lens of deep speaker representation learning

S Wang, Z Chen, KA Lee, Y Qian… - IEEE/ACM Transactions …, 2024 - ieeexplore.ieee.org

Speaker individuality information is among the most critical elements within speech signals.
By thoroughly and accurately modeling this information, it can be utilized in various …

Зберегти Послатися Цитовано в 5 джерелах Пов’язані статті Кількість версій: 4

[Free GPT-4]
[DeepSeek]

[PDF] openreview.net

Uniaudio: Towards universal audio generation with large language models

D Yang, J Tian, X Tan, R Huang, S Liu… - … on Machine Learning, 2024 - openreview.net

Audio generation is a major branch of generative AI research. Compared with prior works in
this area that are commonly task-specific with heavy domain knowledge, this paper …

Зберегти Послатися Цитовано в 12 джерелах Пов’язані статті Кількість версій: 6 Показати у форматі HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Spex+: A complete time domain speaker extraction network

M Ge, C Xu, L Wang, ES Chng, J Dang, H Li - arxiv preprint arxiv …, 2020 - arxiv.org

Speaker extraction aims to extract the target speech signal from a multi-talker environment
given a target speaker's reference speech. We recently proposed a time-domain solution …

Зберегти Послатися Цитовано в 170 джерелах Пов’язані статті Кількість версій: 8 Показати у форматі HTML

[Free GPT-4]
[DeepSeek]

[PDF] ieee.org

Deep learning-based non-intrusive multi-objective speech assessment model with cross-domain features

RE Zezario, SW Fu, F Chen, CS Fuh… - … on Audio, Speech …, 2022 - ieeexplore.ieee.org

This study proposes a cross-domain multi-objective speech assessment model, called
MOSA-Net, which can simultaneously estimate the speech quality, intelligibility, and …

Зберегти Послатися Цитовано в 87 джерелах Пов’язані статті Кількість версій: 7

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Speech enhancement using self-adaptation and multi-head self-attention

Y Koizumi, K Yatabe, M Delcroix… - ICASSP 2020-2020 …, 2020 - ieeexplore.ieee.org

This paper investigates a self-adaptation method for speech enhancement using auxiliary
speaker-aware features; we extract a speaker representation used for adaptation directly …

Зберегти Послатися Цитовано в 158 джерелах Пов’язані статті Кількість версій: 7

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Far-field automatic speech recognition

R Haeb-Umbach, J Heymann, L Drude… - Proceedings of the …, 2020 - ieeexplore.ieee.org

The machine recognition of speech spoken at a distance from the microphones, known as
far-field automatic speech recognition (ASR), has received a significant increase in attention …

Зберегти Послатися Цитовано в 120 джерелах Пов’язані статті Кількість версій: 8

[Free GPT-4]
[DeepSeek]

[PDF] acm.org

Look once to hear: Target speech hearing with noisy examples

B Veluri, M Itani, T Chen, T Yoshioka… - Proceedings of the 2024 …, 2024 - dl.acm.org

In crowded settings, the human brain can focus on speech from a target speaker, given prior
knowledge of how they sound. We introduce a novel intelligent hearable system that …

Зберегти Послатися Цитовано в 13 джерелах Пов’язані статті Кількість версій: 5

Створити сповіщення

Послатися

Розширений пошук

Збережено в моїй бібліотеці

Speakerbeam: Speaker aware neural network for target speaker extraction in speech mixtures

Neural target speech extraction: An overview

Uniaudio: An audio foundation model toward universal audio generation

Speechx: Neural codec language model as a versatile speech transformer

Overview of speaker modeling and its applications: From the lens of deep speaker representation learning

Uniaudio: Towards universal audio generation with large language models

Spex+: A complete time domain speaker extraction network

Deep learning-based non-intrusive multi-objective speech assessment model with cross-domain features

Speech enhancement using self-adaptation and multi-head self-attention

Far-field automatic speech recognition

Look once to hear: Target speech hearing with noisy examples