- Academic Search

Advancing large language models to capture varied speaking styles and respond properly in spoken conversations

GT Lin, CH Chiang, H Lee - arxiv preprint arxiv:2402.12786, 2024 - arxiv.org

In spoken dialogue, even if two current turns are the same sentence, their responses might
still differ when they are spoken in different styles. The spoken styles, containing …

Tallenna Viittaa Viittausten määrä 19 Aiheeseen liittyviä artikkeleita Kaikki 6 versiota HTML-versio

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Wavchat: A survey of spoken dialogue models

S Ji, Y Chen, M Fang, J Zuo, J Lu, H Wang… - arxiv preprint arxiv …, 2024 - arxiv.org

Recent advancements in spoken dialogue models, exemplified by systems like GPT-4o,
have captured significant attention in the speech domain. Compared to traditional three-tier …

Tallenna Viittaa Viittausten määrä 8 Aiheeseen liittyviä artikkeleita Kaikki 2 versiota HTML-versio

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Generative expressive conversational speech synthesis

R Liu, Y Hu, Y Ren, X Yin, H Li - Proceedings of the 32nd ACM …, 2024 - dl.acm.org

Conversational Speech Synthesis (CSS) aims to express a target utterance with the proper
speaking style in a user-agent conversation setting. Existing CSS methods employ effective …

Tallenna Viittaa Viittausten määrä 4 Aiheeseen liittyviä artikkeleita Kaikki 4 versiota

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Align-SLM: Textless Spoken Language Models with Reinforcement Learning from AI Feedback

GT Lin, PG Shivakumar, A Gourav, Y Gu… - arxiv preprint arxiv …, 2024 - arxiv.org

While textless Spoken Language Models (SLMs) have shown potential in end-to-end
speech-to-speech modeling, they still lag behind text-based Large Language Models …

Tallenna Viittaa Viittausten määrä 4 Aiheeseen liittyviä artikkeleita Kaikki 2 versiota HTML-versio

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Style-talker: Finetuning audio language model and style-based text-to-speech model for fast spoken dialogue generation

YA Li, X Jiang, J Darefsky, G Zhu… - arxiv preprint arxiv …, 2024 - arxiv.org

The rapid advancement of large language models (LLMs) has significantly propelled the
development of text-based chatbots, demonstrating their capability to engage in coherent …

Tallenna Viittaa Viittausten määrä 3 Aiheeseen liittyviä artikkeleita Kaikki 3 versiota HTML-versio

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Minmo: A multimodal large language model for seamless voice interaction

Q Chen, Y Chen, Y Chen, M Chen, Y Chen… - arxiv preprint arxiv …, 2025 - arxiv.org

Recent advancements in large language models (LLMs) and multimodal speech-text
models have laid the groundwork for seamless voice interactions, enabling real-time …

Tallenna Viittaa Viittausten määrä 2 Aiheeseen liittyviä artikkeleita Kaikki 2 versiota HTML-versio

Universal Speech Token Learning Via Low-Bitrate Neural Codec and Pretrained Representations

X Jiang, X Peng, Y Zhang, Y Lu - IEEE Journal of Selected …, 2024 - ieeexplore.ieee.org

Current large speech language models are mainly based on semantic tokens from
discretization of self-supervised learned representations and acoustic tokens from a neural …

Tallenna Viittaa Viittausten määrä 1 Aiheeseen liittyviä artikkeleita

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

E-chat: Emotion-sensitive Spoken Dialogue System with Large Language Models

H Xue, Y Liang, B Mu, S Zhang, M Chen… - 2024 IEEE 14th …, 2024 - ieeexplore.ieee.org

This study focuses on emotion-sensitive spoken dialogue in human-machine speech
interaction. With the advancement of Large Language Models (LLMs), dialogue systems can …

Tallenna Viittaa Viittausten määrä 4 Aiheeseen liittyviä artikkeleita Kaikki 3 versiota

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Can LLMs Understand the Implication of Emphasized Sentences in Dialogue?

GT Lin, H Lee - arxiv preprint arxiv:2406.11065, 2024 - arxiv.org

Emphasis is a crucial component in human communication, which indicates the speaker's
intention and implication beyond pure text in dialogue. While Large Language Models …

Tallenna Viittaa Viittausten määrä 1 Aiheeseen liittyviä artikkeleita Kaikki 4 versiota HTML-versio

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Frozen Large Language Models Can Perceive Paralinguistic Aspects of Speech

W Kang, J Jia, C Wu, W Zhou, E Lakomkin… - arxiv preprint arxiv …, 2024 - arxiv.org

As speech becomes an increasingly common modality for interacting with large language
models (LLMs), it is becoming desirable to develop systems where LLMs can take into …

Tallenna Viittaa Viittausten määrä 1 Aiheeseen liittyviä artikkeleita Kaikki 2 versiota HTML-versio

Luo ilmoitus

Viittaa

Tarkennettu haku

Tallennettu omaan kirjastoon

Paralinguistics-enhanced large language modeling of spoken dialogue

Advancing large language models to capture varied speaking styles and respond properly in spoken conversations

Wavchat: A survey of spoken dialogue models

Generative expressive conversational speech synthesis

Align-SLM: Textless Spoken Language Models with Reinforcement Learning from AI Feedback

Style-talker: Finetuning audio language model and style-based text-to-speech model for fast spoken dialogue generation

Minmo: A multimodal large language model for seamless voice interaction

Universal Speech Token Learning Via Low-Bitrate Neural Codec and Pretrained Representations

E-chat: Emotion-sensitive Spoken Dialogue System with Large Language Models

Can LLMs Understand the Implication of Emphasized Sentences in Dialogue?

Frozen Large Language Models Can Perceive Paralinguistic Aspects of Speech