- Academic Search

B Yan, S Dalmia, Y Higuchi, G Neubig, F Metze… - arxiv preprint arxiv …, 2022 - arxiv.org

Connectionist Temporal Classification (CTC) is a widely used approach for automatic
speech recognition (ASR) that performs conditionally independent monotonic alignment …

บันทึก อ้างอิง อ้างโดย36 บทความที่เกี่ยวข้อง ทั้งหมด 6 ฉบับ ดูในรูปแบบ HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Deep speech synthesis from MRI-based articulatory representations

P Wu, T Li, Y Lu, Y Zhang, J Lian, AW Black… - arxiv preprint arxiv …, 2023 - arxiv.org

In this paper, we study articulatory synthesis, a speech synthesis method using human vocal
tract information that offers a way to develop efficient, generalizable and interpretable …

บันทึก อ้างอิง อ้างโดย20 บทความที่เกี่ยวข้อง ทั้งหมด 10 ฉบับ ดูในรูปแบบ HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Recent advances in end-to-end simultaneous speech translation

X Liu, G Hu, Y Du, E He, YF Luo, C Xu, T **ao… - arxiv preprint arxiv …, 2024 - arxiv.org

Simultaneous speech translation (SimulST) is a demanding task that involves generating
translations in real-time while continuously processing speech input. This paper offers a …

บันทึก อ้างอิง อ้างโดย1 บทความที่เกี่ยวข้อง ทั้งหมด 5 ฉบับ ดูในรูปแบบ HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

ESPnet-ST-v2: Multipurpose spoken language translation toolkit

B Yan, J Shi, Y Tang, H Inaguma, Y Peng… - arxiv preprint arxiv …, 2023 - arxiv.org

ESPnet-ST-v2 is a revamp of the open-source ESPnet-ST toolkit necessitated by the
broadening interests of the spoken language translation community. ESPnet-ST-v2 supports …

บันทึก อ้างอิง อ้างโดย14 บทความที่เกี่ยวข้อง ทั้งหมด 8 ฉบับ ดูในรูปแบบ HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Bass: Block-wise adaptation for speech summarization

R Sharma, K Zheng, S Arora, S Watanabe… - arxiv preprint arxiv …, 2023 - arxiv.org

End-to-end speech summarization has been shown to improve performance over cascade
baselines. However, such models are difficult to train on very large inputs (dozens of …

บันทึก อ้างอิง อ้างโดย6 บทความที่เกี่ยวข้อง ทั้งหมด 8 ฉบับ ดูในรูปแบบ HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Incremental blockwise beam search for simultaneous speech translation with controllable quality-latency tradeoff

P Polák, B Yan, S Watanabe, A Waibel… - arxiv preprint arxiv …, 2023 - arxiv.org

Blockwise self-attentional encoder models have recently emerged as one promising end-to-
end approach to simultaneous speech translation. These models employ a blockwise beam …

บันทึก อ้างอิง อ้างโดย5 บทความที่เกี่ยวข้อง ทั้งหมด 6 ฉบับ ดูในรูปแบบ HTML

[Free GPT-4]
[DeepSeek]

[HTML] sciencedirect.com

[HTML][HTML] Decoupled structure for improved adaptability of end-to-end models

K Deng, PC Woodland - Speech Communication, 2024 - Elsevier

Although end-to-end (E2E) trainable automatic speech recognition (ASR) has shown great
success by jointly learning acoustic and linguistic information, it still suffers from the effect of …

บันทึก อ้างอิง อ้างโดย3 บทความที่เกี่ยวข้อง ทั้งหมด 5 ฉบับ

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

How" Real" is Your Real-Time Simultaneous Speech-to-Text Translation System?

S Papi, P Polak, O Bojar, D Macháček - arxiv preprint arxiv:2412.18495, 2024 - arxiv.org

Simultaneous speech-to-text translation (SimulST) translates source-language speech into
target-language text concurrently with the speaker's speech, ensuring low latency for better …

บันทึก อ้างอิง อ้างโดย1 บทความที่เกี่ยวข้อง ทั้งหมด 2 ฉบับ ดูในรูปแบบ HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Long-form end-to-end speech translation via latent alignment segmentation

P Polák, O Bojar - 2024 IEEE Spoken Language Technology …, 2024 - ieeexplore.ieee.org

Contemporary datasets provide an oracle segmentation into sentences based on human-
annotated transcripts and translations. However, the segmentation into sentences is not …

บันทึก อ้างอิง อ้างโดย2 บทความที่เกี่ยวข้อง ทั้งหมด 3 ฉบับ

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

End-to-end single-channel speaker-turn aware conversational speech translation

J Zuluaga-Gomez, Z Huang, X Niu, R Paturi… - arxiv preprint arxiv …, 2023 - arxiv.org

Conventional speech-to-text translation (ST) systems are trained on single-speaker
utterances, and they may not generalize to real-life scenarios where the audio contains …

บันทึก อ้างอิง อ้างโดย4 บทความที่เกี่ยวข้อง ทั้งหมด 7 ฉบับ ดูในรูปแบบ HTML

สร้างการแจ้งเตือน

อ้างอิง

การค้นหาขั้นสูง

บันทึกไปยังคลังของฉันแล้ว

Blockwise streaming transformer for spoken language understanding and simultaneous speech...

CTC alignments improve autoregressive translation

Deep speech synthesis from MRI-based articulatory representations

Recent advances in end-to-end simultaneous speech translation

ESPnet-ST-v2: Multipurpose spoken language translation toolkit

Bass: Block-wise adaptation for speech summarization

Incremental blockwise beam search for simultaneous speech translation with controllable quality-latency tradeoff

[HTML][HTML] Decoupled structure for improved adaptability of end-to-end models

How" Real" is Your Real-Time Simultaneous Speech-to-Text Translation System?

Long-form end-to-end speech translation via latent alignment segmentation

End-to-end single-channel speaker-turn aware conversational speech translation