- Academic Search

A Défossez, L Mazaré, M Orsini, A Royer… - arxiv preprint arxiv …, 2024 - arxiv.org

We introduce Moshi, a speech-text foundation model and full-duplex spoken dialogue
framework. Current systems for spoken dialogue rely on pipelines of independent …

Salva Cita Citato da 40 Articoli correlati Tutte e 4 le versioni Versione HTML

[Free GPT-4]

[PDF] arxiv.org

Recent advances in speech language models: A survey

W Cui, D Yu, X Jiao, Z Meng, G Zhang, Q Wang… - arxiv preprint arxiv …, 2024 - arxiv.org

Large Language Models (LLMs) have recently garnered significant attention, primarily for
their capabilities in text-based interactions. However, natural human interaction often relies …

Salva Cita Citato da 4 Articoli correlati Tutte e 2 le versioni Versione HTML

[Free GPT-4]

[PDF] arxiv.org

Llama-omni: Seamless speech interaction with large language models

Q Fang, S Guo, Y Zhou, Z Ma, S Zhang… - arxiv preprint arxiv …, 2024 - arxiv.org

Models like GPT-4o enable real-time interaction with large language models (LLMs) through
speech, significantly enhancing user experience compared to traditional text-based …

Salva Cita Citato da 30 Articoli correlati Tutte e 3 le versioni Versione HTML

[Free GPT-4]

[PDF] arxiv.org

Wavchat: A survey of spoken dialogue models

S Ji, Y Chen, M Fang, J Zuo, J Lu, H Wang… - arxiv preprint arxiv …, 2024 - arxiv.org

Recent advancements in spoken dialogue models, exemplified by systems like GPT-4o,
have captured significant attention in the speech domain. Compared to traditional three-tier …

Salva Cita Citato da 6 Articoli correlati Versione HTML

[Free GPT-4]

[PDF] arxiv.org

Style-talker: Finetuning audio language model and style-based text-to-speech model for fast spoken dialogue generation

YA Li, X Jiang, J Darefsky, G Zhu… - arxiv preprint arxiv …, 2024 - arxiv.org

The rapid advancement of large language models (LLMs) has significantly propelled the
development of text-based chatbots, demonstrating their capability to engage in coherent …

Salva Cita Citato da 3 Articoli correlati Tutte e 2 le versioni Versione HTML

[Free GPT-4]

[PDF] arxiv.org

Analyzing and Mitigating Inconsistency in Discrete Audio Tokens for Neural Codec Language Models

W Liu, Z Guo, J Xu, Y Lv, Y Chu, Z Zhao… - arxiv preprint arxiv …, 2024 - arxiv.org

Building upon advancements in Large Language Models (LLMs), the field of audio
processing has seen increased interest in training audio generation tasks with discrete …

Salva Cita Citato da 1 Articoli correlati Versione HTML

[Free GPT-4]

[PDF] arxiv.org

DFADD: The Diffusion and Flow-Matching Based Audio Deepfake Dataset

J Du, IM Lin, IH Chiu, X Chen, H Wu… - 2024 IEEE Spoken …, 2024 - ieeexplore.ieee.org

Mainstream zero-shot TTS production systems like Voicebox and Seed-TTS achieve human
parity speech by leveraging Flow-matching and Diffusion models, respectively …

Salva Cita Citato da 1 Articoli correlati Tutte e 3 le versioni

[Free GPT-4]

[PDF] arxiv.org

Body of Her: A Preliminary Study on End-to-End Humanoid Agent

T Ao - arxiv preprint arxiv:2408.02879, 2024 - arxiv.org

Interactive virtual humanoid agent is a crucial interface with the physical world. A relatively
complete humanoid agent first needs to have face and body, then possess both verbal and …

Salva Cita Citato da 1 Articoli correlati Versione HTML

[Free GPT-4]

[PDF] arxiv.org

Neural Codec Source Tracing: Toward Comprehensive Attribution in Open-Set Condition

Y **e, X Wang, Z Wang, R Fu, Z Wen, S Cao… - arxiv preprint arxiv …, 2025 - arxiv.org

Current research in audio deepfake detection is gradually transitioning from binary
classification to multi-class tasks, referred as audio deepfake source tracing task. However …

Salva Cita Articoli correlati Versione HTML

[Free GPT-4]

[PDF] arxiv.org

PSLM: Parallel Generation of Text and Speech with LLMs for Low-Latency Spoken Dialogue Systems

K Mitsui, K Mitsuda, T Wakatsuki, Y Hono… - arxiv preprint arxiv …, 2024 - arxiv.org

Multimodal language models that process both text and speech have a potential for
applications in spoken dialogue systems. However, current models face two major …

Salva Cita Citato da 5 Articoli correlati Versione HTML

Crea avviso

Cita

Ricerca avanzata

Salvato in La mia biblioteca

SpeechGPT-Gen: Scaling Chain-of-Information Speech Generation

Moshi: a speech-text foundation model for real-time dialogue

Recent advances in speech language models: A survey

Llama-omni: Seamless speech interaction with large language models

Wavchat: A survey of spoken dialogue models

Style-talker: Finetuning audio language model and style-based text-to-speech model for fast spoken dialogue generation

Analyzing and Mitigating Inconsistency in Discrete Audio Tokens for Neural Codec Language Models

DFADD: The Diffusion and Flow-Matching Based Audio Deepfake Dataset

Body of Her: A Preliminary Study on End-to-End Humanoid Agent

Neural Codec Source Tracing: Toward Comprehensive Attribution in Open-Set Condition

PSLM: Parallel Generation of Text and Speech with LLMs for Low-Latency Spoken Dialogue Systems