- Academic Search

N Kanda, X Wang, SE Eskimez, M Thakker… - arxiv preprint arxiv …, 2024 - arxiv.org

Laughter is one of the most expressive and natural aspects of human speech, conveying
emotions, social cues, and humor. However, most text-to-speech (TTS) systems lack the …

保存引用被引用数: 6 関連記事全 2 バージョン HTMLバージョン

[Free GPT-4]

[PDF] arxiv.org

Laugh Now Cry Later: Controlling Time-Varying Emotional States of Flow-Matching-Based Zero-Shot Text-To-Speech

H Wu, X Wang, SE Eskimez, M Thakker… - 2024 IEEE Spoken …, 2024 - ieeexplore.ieee.org

People change their tones of voice, often accompanied by nonverbal vocalizations (NVs)
such as laughter and cries, to convey rich emotions. However, most text-to-speech (TTS) …

保存引用被引用数: 4 関連記事全 3 バージョン

[Free GPT-4]

[PDF] arxiv.org

How" Real" is Your Real-Time Simultaneous Speech-to-Text Translation System?

S Papi, P Polak, O Bojar, D Macháček - arxiv preprint arxiv:2412.18495, 2024 - arxiv.org

Simultaneous speech-to-text translation (SimulST) translates source-language speech into
target-language text concurrently with the speaker's speech, ensuring low latency for better …

保存引用被引用数: 1 関連記事 HTMLバージョン

[Free GPT-4]

[PDF] arxiv.org

Streaming Speaker Change Detection and Gender Classification for Transducer-Based Multi-Talker Speech Translation

P Wang, N Kanda, J Xue, J Li, X Wang… - arxiv preprint arxiv …, 2025 - arxiv.org

Streaming multi-talker speech translation is a task that involves not only generating accurate
and fluent translations with low latency but also recognizing when a speaker change occurs …

保存引用関連記事 HTMLバージョン

[Free GPT-4]

[PDF] arxiv.org

Soft Language Identification for Language-Agnostic Many-to-One End-to-End Speech Translation

P Wang, J Xue, J Li, J Chen… - arxiv preprint arxiv …, 2024 - arxiv.org

Language-agnostic many-to-one end-to-end speech translation models can convert audio
signals from different source languages into text in a target language. These models do not …

保存引用関連記事 HTMLバージョン

[Free GPT-4]

[PDF] arxiv.org

LS-EEND: Long-Form Streaming End-to-End Neural Diarization with Online Attractor Extraction

D Liang, X Li - arxiv preprint arxiv:2410.06670, 2024 - arxiv.org

This work proposes a frame-wise online/streaming end-to-end neural diarization (EEND)
method, which detects speaker activities in a frame-in-frame-out fashion. The proposed …

保存引用関連記事全 2 バージョン HTMLバージョン

アラートを作成

引用

検索オプション

マイライブラリに保存しました

Diarist: Streaming Speech Translation with Speaker Diarization

Making flow-matching-based zero-shot text-to-speech laugh as you like

Laugh Now Cry Later: Controlling Time-Varying Emotional States of Flow-Matching-Based Zero-Shot Text-To-Speech

How" Real" is Your Real-Time Simultaneous Speech-to-Text Translation System?

Streaming Speaker Change Detection and Gender Classification for Transducer-Based Multi-Talker Speech Translation

Soft Language Identification for Language-Agnostic Many-to-One End-to-End Speech Translation

LS-EEND: Long-Form Streaming End-to-End Neural Diarization with Online Attractor Extraction