Google 학술 검색

X Wang, M Thakker, Z Chen, N Kanda… - … on Audio, Speech …, 2024 - ieeexplore.ieee.org

Recent advancements in generative speech models based on audio-text prompts have
enabled remarkable innovations like high-quality zero-shot text-to-speech. However …

저장 인용 68회 인용 관련 학술자료 전체 2개의 버전

[Free GPT-4]

[PDF] arxiv.org

Overview of speaker modeling and its applications: From the lens of deep speaker representation learning

S Wang, Z Chen, KA Lee, Y Qian… - IEEE/ACM Transactions …, 2024 - ieeexplore.ieee.org

Speaker individuality information is among the most critical elements within speech signals.
By thoroughly and accurately modeling this information, it can be utilized in various …

저장 인용 4회 인용 관련 학술자료 전체 4개의 버전

[Free GPT-4]

[PDF] arxiv.org

Wavtokenizer: an efficient acoustic discrete codec tokenizer for audio language modeling

S Ji, Z Jiang, W Wang, Y Chen, M Fang, J Zuo… - arxiv preprint arxiv …, 2024 - arxiv.org

Language models have been effectively applied to modeling natural signals, such as
images, video, speech, and audio. A crucial component of these models is the codec …

저장 인용 21회 인용 관련 학술자료 전체 2개의 버전 HTML 버전

[Free GPT-4]

[PDF] arxiv.org

Codec-superb@ slt 2024: A lightweight benchmark for neural audio codec models

H Wu, X Chen, YC Lin, K Chang, J Du… - 2024 IEEE Spoken …, 2024 - ieeexplore.ieee.org

Neural audio codec models are becoming increasingly important as they serve as
tokenizers for audio, enabling efficient transmission or facilitating speech language …

저장 인용 3회 인용 관련 학술자료 전체 3개의 버전

[Free GPT-4]

[PDF] arxiv.org

The VoicePrivacy 2024 Challenge Evaluation Plan

N Tomashenko, X Miao, P Champion, S Meyer… - arxiv preprint arxiv …, 2024 - arxiv.org

The task of the challenge is to develop a voice anonymization system for speech data which
conceals the speaker's voice identity while protecting linguistic content and emotional states …

저장 인용 96회 인용 관련 학술자료 전체 23개의 버전 HTML 버전

[Free GPT-4]

[PDF] arxiv.org

Autoregressive speech synthesis without vector quantization

L Meng, L Zhou, S Liu, S Chen, B Han, S Hu… - arxiv preprint arxiv …, 2024 - arxiv.org

We present MELLE, a novel continuous-valued tokens based language modeling approach
for text to speech synthesis (TTS). MELLE autoregressively generates continuous mel …

저장 인용 21회 인용 관련 학술자료 전체 2개의 버전 HTML 버전

[Free GPT-4]

[PDF] arxiv.org

Amphion: an Open-Source Audio, Music, and Speech Generation Toolkit

X Zhang, L Xue, Y Gu, Y Wang, J Li… - 2024 IEEE Spoken …, 2024 - ieeexplore.ieee.org

Amphion is an open-source toolkit for Audio, Music, and Speech Generation, targeting to
ease the way for junior researchers and engineers into these fields. It presents a unified …

저장 인용 25회 인용 관련 학술자료 전체 2개의 버전

[Free GPT-4]

[PDF] arxiv.org

E2 tts: Embarrassingly easy fully non-autoregressive zero-shot tts

SE Eskimez, X Wang, M Thakker, C Li… - 2024 IEEE Spoken …, 2024 - ieeexplore.ieee.org

This paper introduces Embarrassingly Easy Text-to-Speech (E2 TTS), a fully non-
autoregressive zero-shot text-to-speech system that offers human-level naturalness and …

저장 인용 19회 인용 관련 학술자료 전체 3개의 버전

[Free GPT-4]

[PDF] arxiv.org

F5-tts: A fairytaler that fakes fluent and faithful speech with flow matching

Y Chen, Z Niu, Z Ma, K Deng, C Wang, J Zhao… - arxiv preprint arxiv …, 2024 - arxiv.org

This paper introduces F5-TTS, a fully non-autoregressive text-to-speech system based on
flow matching with Diffusion Transformer (DiT). Without requiring complex designs such as …

저장 인용 17회 인용 관련 학술자료 전체 2개의 버전 HTML 버전

[Free GPT-4]

[PDF] arxiv.org

Emilia: An extensive, multilingual, and diverse speech dataset for large-scale speech generation

H He, Z Shang, C Wang, X Li, Y Gu… - 2024 IEEE Spoken …, 2024 - ieeexplore.ieee.org

Recent advancements in speech generation models have been significantly driven by the
use of large-scale training data. However, producing highly spontaneous, human-like …

저장 인용 21회 인용 관련 학술자료 전체 3개의 버전

알림 만들기

인용

고급 검색

라이브러리에 저장됨

Naturalspeech 3: Zero-shot speech synthesis with factorized codec and diffusion models

Speechx: Neural codec language model as a versatile speech transformer

Overview of speaker modeling and its applications: From the lens of deep speaker representation learning

Wavtokenizer: an efficient acoustic discrete codec tokenizer for audio language modeling

Codec-superb@ slt 2024: A lightweight benchmark for neural audio codec models

The VoicePrivacy 2024 Challenge Evaluation Plan

Autoregressive speech synthesis without vector quantization

Amphion: an Open-Source Audio, Music, and Speech Generation Toolkit

E2 tts: Embarrassingly easy fully non-autoregressive zero-shot tts

F5-tts: A fairytaler that fakes fluent and faithful speech with flow matching

Emilia: An extensive, multilingual, and diverse speech dataset for large-scale speech generation