محقق Google

H Wu, X Chen, YC Lin, K Chang, HL Chung… - arxiv preprint arxiv …, 2024‏ - arxiv.org‏

Neural audio codecs are initially introduced to compress audio data into compact codes to
reduce transmission latency. Researchers recently discovered the potential of codecs as …‏

ذخیره ارجاع بیان شده در 29 یافته مقاله‌های مربوط تمام نسخه‌های 3 نسخه HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Vall-e 2: Neural codec language models are human parity zero-shot text to speech synthesizers‏

S Chen, S Liu, L Zhou, Y Liu, X Tan, J Li, S Zhao… - arxiv preprint arxiv …, 2024‏ - arxiv.org‏

This paper introduces VALL-E 2, the latest advancement in neural codec language models
that marks a milestone in zero-shot text-to-speech synthesis (TTS), achieving human parity …‏

ذخیره ارجاع بیان شده در 60 یافته مقاله‌های مربوط تمام نسخه‌های 3 نسخه HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Funcodec: A fundamental, reproducible and integrable open-source toolkit for neural speech codec‏

Z Du, S Zhang, K Hu, S Zheng - ICASSP 2024-2024 IEEE …, 2024‏ - ieeexplore.ieee.org‏

This paper presents FunCodec, a fundamental neural speech codec toolkit, which is an
extension of the open-source speech processing toolkit FunASR. FunCodec provides …‏

ذخیره ارجاع بیان شده در 54 یافته مقاله‌های مربوط تمام نسخه‌های 3

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Wavtokenizer: an efficient acoustic discrete codec tokenizer for audio language modeling‏

S Ji, Z Jiang, W Wang, Y Chen, M Fang, J Zuo… - arxiv preprint arxiv …, 2024‏ - arxiv.org‏

Language models have been effectively applied to modeling natural signals, such as
images, video, speech, and audio. A crucial component of these models is the codec …‏

ذخیره ارجاع بیان شده در 27 یافته مقاله‌های مربوط تمام نسخه‌های 3 نسخه HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Codec-superb@ slt 2024: A lightweight benchmark for neural audio codec models‏

H Wu, X Chen, YC Lin, K Chang, J Du… - 2024 IEEE Spoken …, 2024‏ - ieeexplore.ieee.org‏

Neural audio codec models are becoming increasingly important as they serve as
tokenizers for audio, enabling efficient transmission or facilitating speech language …‏

ذخیره ارجاع بیان شده در 4 یافته مقاله‌های مربوط تمام نسخه‌های 4

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Advancing large language models to capture varied speaking styles and respond properly in spoken conversations‏

GT Lin, CH Chiang, H Lee - arxiv preprint arxiv:2402.12786, 2024‏ - arxiv.org‏

In spoken dialogue, even if two current turns are the same sentence, their responses might
still differ when they are spoken in different styles. The spoken styles, containing …‏

ذخیره ارجاع بیان شده در 19 یافته مقاله‌های مربوط تمام نسخه‌های 6 نسخه HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

F5-tts: A fairytaler that fakes fluent and faithful speech with flow matching‏

Y Chen, Z Niu, Z Ma, K Deng, C Wang, J Zhao… - arxiv preprint arxiv …, 2024‏ - arxiv.org‏

This paper introduces F5-TTS, a fully non-autoregressive text-to-speech system based on
flow matching with Diffusion Transformer (DiT). Without requiring complex designs such as …‏

ذخیره ارجاع بیان شده در 22 یافته مقاله‌های مربوط تمام نسخه‌های 2 نسخه HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Repcodec: A speech representation codec for speech tokenization‏

Z Huang, C Meng, T Ko - arxiv preprint arxiv:2309.00169, 2023‏ - arxiv.org‏

With recent rapid growth of large language models (LLMs), discrete speech tokenization has
played an important role for injecting speech into LLMs. However, this discretization gives …‏

ذخیره ارجاع بیان شده در 25 یافته مقاله‌های مربوط تمام نسخه‌های 6 نسخه HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Codec-SUPERB: An in-depth analysis of sound codec models‏

H Wu, HL Chung, YC Lin, YK Wu, X Chen… - arxiv preprint arxiv …, 2024‏ - arxiv.org‏

The sound codec's dual roles in minimizing data transmission latency and serving as
tokenizers underscore its critical importance. Recent years have witnessed significant …‏

ذخیره ارجاع بیان شده در 16 یافته مقاله‌های مربوط تمام نسخه‌های 4 نسخه HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

APCodec: A neural audio codec with parallel amplitude and phase spectrum encoding and decoding‏

Y Ai, XH Jiang, YX Lu, HP Du… - IEEE/ACM Transactions …, 2024‏ - ieeexplore.ieee.org‏

This paper introduces a novel neural audio codec targeting high waveform sampling rates
and low bitrates named APCodec, which seamlessly integrates the strengths of parametric …‏

ذخیره ارجاع بیان شده در 20 یافته مقاله‌های مربوط تمام نسخه‌های 4

ایجاد هشدار

ارجاع

جستجوی پیشرفته

در «کتابخانه من» ذخیره شد

Audiodec: An open-source streaming high-fidelity neural audio codec

Towards audio language modeling--an overview‏

Vall-e 2: Neural codec language models are human parity zero-shot text to speech synthesizers‏

Funcodec: A fundamental, reproducible and integrable open-source toolkit for neural speech codec‏

Wavtokenizer: an efficient acoustic discrete codec tokenizer for audio language modeling‏

Codec-superb@ slt 2024: A lightweight benchmark for neural audio codec models‏

Advancing large language models to capture varied speaking styles and respond properly in spoken conversations‏

F5-tts: A fairytaler that fakes fluent and faithful speech with flow matching‏

Repcodec: A speech representation codec for speech tokenization‏

Codec-SUPERB: An in-depth analysis of sound codec models‏

APCodec: A neural audio codec with parallel amplitude and phase spectrum encoding and decoding‏