- Academic Search

Funaudiollm: Voice understanding and generation foundation models for natural interaction between humans and llms

K An, Q Chen, C Deng, Z Du, C Gao, Z Gao… - arxiv preprint arxiv …, 2024 - arxiv.org

This report introduces FunAudioLLM, a model family designed to enhance natural voice
interactions between humans and large language models (LLMs). At its core are two …

Tallenna Viittaa Viittausten määrä 25 Aiheeseen liittyviä artikkeleita Kaikki 5 versiota HTML-versio

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Wavchat: A survey of spoken dialogue models

S Ji, Y Chen, M Fang, J Zuo, J Lu, H Wang… - arxiv preprint arxiv …, 2024 - arxiv.org

Recent advancements in spoken dialogue models, exemplified by systems like GPT-4o,
have captured significant attention in the speech domain. Compared to traditional three-tier …

Tallenna Viittaa Viittausten määrä 8 Aiheeseen liittyviä artikkeleita Kaikki 2 versiota HTML-versio

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Towards Controllable Speech Synthesis in the Era of Large Language Models: A Survey

T **e, Y Rong, P Zhang, L Liu - arxiv preprint arxiv:2412.06602, 2024 - arxiv.org

Text-to-speech (TTS), also known as speech synthesis, is a prominent research area that
aims to generate natural-sounding human speech from text. Recently, with the increasing …

Tallenna Viittaa Viittausten määrä 1 Aiheeseen liittyviä artikkeleita Kaikki 2 versiota HTML-versio

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Ace: A generative cross-modal retrieval framework with coarse-to-fine semantic modeling

M Fang, S Ji, J Zuo, H Huang, Y **a, J Zhu… - arxiv preprint arxiv …, 2024 - arxiv.org

Generative retrieval, which has demonstrated effectiveness in text-to-text retrieval, utilizes a
sequence-to-sequence model to directly generate candidate identifiers based on natural …

Tallenna Viittaa Viittausten määrä 5 Aiheeseen liittyviä artikkeleita Kaikki 3 versiota HTML-versio

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Minmo: A multimodal large language model for seamless voice interaction

Q Chen, Y Chen, Y Chen, M Chen, Y Chen… - arxiv preprint arxiv …, 2025 - arxiv.org

Recent advancements in large language models (LLMs) and multimodal speech-text
models have laid the groundwork for seamless voice interactions, enabling real-time …

Tallenna Viittaa Viittausten määrä 2 Aiheeseen liittyviä artikkeleita Kaikki 2 versiota HTML-versio

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Speech Watermarking with Discrete Intermediate Representations

S Ji, Z Jiang, J Zuo, M Fang, Y Chen, T **… - arxiv preprint arxiv …, 2024 - arxiv.org

Speech watermarking techniques can proactively mitigate the potential harmful
consequences of instant voice cloning techniques. These techniques involve the insertion of …

Tallenna Viittaa Aiheeseen liittyviä artikkeleita Kaikki 2 versiota HTML-versio

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Semantic Residual for Multimodal Unified Discrete Representation

H Huang, S Wang, Y **a - arxiv preprint arxiv:2412.19128, 2024 - arxiv.org

Recent research in the domain of multimodal unified representations predominantly
employs codebook as representation forms, utilizing Vector Quantization (VQ) for …

Tallenna Viittaa Aiheeseen liittyviä artikkeleita Kaikki 2 versiota HTML-versio

Luo ilmoitus

Viittaa

Tarkennettu haku

Tallennettu omaan kirjastoon

Controlspeech: Towards simultaneous zero-shot speaker cloning and zero-shot language style...

Funaudiollm: Voice understanding and generation foundation models for natural interaction between humans and llms

Wavchat: A survey of spoken dialogue models

Towards Controllable Speech Synthesis in the Era of Large Language Models: A Survey

Ace: A generative cross-modal retrieval framework with coarse-to-fine semantic modeling

Minmo: A multimodal large language model for seamless voice interaction

Speech Watermarking with Discrete Intermediate Representations

Semantic Residual for Multimodal Unified Discrete Representation