Google Učenjak

H Wu, X Chen, YC Lin, K Chang, HL Chung… - arxiv preprint arxiv …, 2024 - arxiv.org

Neural audio codecs are initially introduced to compress audio data into compact codes to
reduce transmission latency. Researchers recently discovered the potential of codecs as …

Shrani Navedi Navedeno v 29 virih Sorodni članki Vse različice: 3 V obliki HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Sparks of large audio models: A survey and outlook

S Latif, M Shoukat, F Shamshad, M Usama… - arxiv preprint arxiv …, 2023 - arxiv.org

This survey paper provides a comprehensive overview of the recent advancements and
challenges in applying large language models to the field of audio signal processing. Audio …

Shrani Navedi Navedeno v 35 virih Sorodni članki Vse različice: 4 V obliki HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Qwen-audio: Advancing universal audio understanding via unified large-scale audio-language models

Y Chu, J Xu, X Zhou, Q Yang, S Zhang, Z Yan… - arxiv preprint arxiv …, 2023 - arxiv.org

Recently, instruction-following audio-language models have received broad attention for
audio interaction with humans. However, the absence of pre-trained audio models capable …

Shrani Navedi Navedeno v 247 virih Sorodni članki Vse različice: 2 V obliki HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Uniaudio: An audio foundation model toward universal audio generation

D Yang, J Tian, X Tan, R Huang, S Liu, X Chang… - arxiv preprint arxiv …, 2023 - arxiv.org

Large Language models (LLM) have demonstrated the capability to handle a variety of
generative tasks. This paper presents the UniAudio system, which, unlike prior task-specific …

Shrani Navedi Navedeno v 114 virih Sorodni članki Vse različice: 3 V obliki HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Vall-e 2: Neural codec language models are human parity zero-shot text to speech synthesizers

S Chen, S Liu, L Zhou, Y Liu, X Tan, J Li, S Zhao… - arxiv preprint arxiv …, 2024 - arxiv.org

This paper introduces VALL-E 2, the latest advancement in neural codec language models
that marks a milestone in zero-shot text-to-speech synthesis (TTS), achieving human parity …

Shrani Navedi Navedeno v 60 virih Sorodni članki Vse različice: 3 V obliki HTML

[Free GPT-4]
[DeepSeek]

[PDF] openreview.net

Uniaudio: Towards universal audio generation with large language models

D Yang, J Tian, X Tan, R Huang, S Liu… - … on Machine Learning, 2024 - openreview.net

Audio generation is a major branch of generative AI research. Compared with prior works in
this area that are commonly task-specific with heavy domain knowledge, this paper …

Shrani Navedi Navedeno v 12 virih Sorodni članki Vse različice: 6 V obliki HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Ella-v: Stable neural codec language modeling with alignment-guided sequence reordering

Y Song, Z Chen, X Wang, Z Ma, X Chen - arxiv preprint arxiv:2401.07333, 2024 - arxiv.org

The language model (LM) approach based on acoustic and linguistic prompts, such as
VALL-E, has achieved remarkable progress in the field of zero-shot audio generation …

Shrani Navedi Navedeno v 35 virih Sorodni članki Vse različice: 2 V obliki HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Base tts: Lessons from building a billion-parameter text-to-speech model on 100k hours of data

M Łajszczak, G Cámbara, Y Li, F Beyhan… - arxiv preprint arxiv …, 2024 - arxiv.org

We introduce a text-to-speech (TTS) model called BASE TTS, which stands for $\textbf {B} $
ig $\textbf {A} $ daptive $\textbf {S} $ treamable TTS with $\textbf {E} $ mergent abilities …

Shrani Navedi Navedeno v 69 virih Sorodni članki Vse različice: 6 V obliki HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

E2 tts: Embarrassingly easy fully non-autoregressive zero-shot tts

SE Eskimez, X Wang, M Thakker, C Li… - 2024 IEEE Spoken …, 2024 - ieeexplore.ieee.org

This paper introduces Embarrassingly Easy Text-to-Speech (E2 TTS), a fully non-
autoregressive zero-shot text-to-speech system that offers human-level naturalness and …

Shrani Navedi Navedeno v 23 virih Sorodni članki Vse različice: 4

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Codec-SUPERB: An in-depth analysis of sound codec models

H Wu, HL Chung, YC Lin, YK Wu, X Chen… - arxiv preprint arxiv …, 2024 - arxiv.org

The sound codec's dual roles in minimizing data transmission latency and serving as
tokenizers underscore its critical importance. Recent years have witnessed significant …

Shrani Navedi Navedeno v 17 virih Sorodni članki Vse različice: 4 V obliki HTML

Ustvari opozorilo

Navedi

Napredno iskanje

Shranjeno v Mojo knjižnico

Speechx: Neural codec language model as a versatile speech transformer

Towards audio language modeling--an overview

Sparks of large audio models: A survey and outlook

Qwen-audio: Advancing universal audio understanding via unified large-scale audio-language models

Uniaudio: An audio foundation model toward universal audio generation

Vall-e 2: Neural codec language models are human parity zero-shot text to speech synthesizers

Uniaudio: Towards universal audio generation with large language models

Ella-v: Stable neural codec language modeling with alignment-guided sequence reordering

Base tts: Lessons from building a billion-parameter text-to-speech model on 100k hours of data

E2 tts: Embarrassingly easy fully non-autoregressive zero-shot tts

Codec-SUPERB: An in-depth analysis of sound codec models