Over-generation cannot be rewarded: Length-adaptive average lagging for simultaneous speech translation

S Papi, M Gaido, M Negri, M Turchi - arxiv preprint arxiv:2206.05807, 2022 - arxiv.org
Simultaneous speech translation (SimulST) systems aim at generating their output with the
lowest possible latency, which is normally computed in terms of Average Lagging (AL). In …

Attention as a guide for simultaneous speech translation

S Papi, M Negri, M Turchi - arxiv preprint arxiv:2212.07850, 2022 - arxiv.org
The study of the attention mechanism has sparked interest in many fields, such as language
modeling and machine translation. Although its patterns have been exploited to perform …

Recent advances in end-to-end simultaneous speech translation

X Liu, G Hu, Y Du, E He, YF Luo, C Xu, T **ao… - arxiv preprint arxiv …, 2024 - arxiv.org
Simultaneous speech translation (SimulST) is a demanding task that involves generating
translations in real-time while continuously processing speech input. This paper offers a …

Alignatt: Using attention-based audio-translation alignments as a guide for simultaneous speech translation

S Papi, M Turchi, M Negri - arxiv preprint arxiv:2305.11408, 2023 - arxiv.org
Attention is the core mechanism of today's most used architectures for natural language
processing and has been analyzed from many perspectives, including its effectiveness for …

Efficient yet competitive speech translation: FBK@ IWSLT2022

M Gaido, S Papi, D Fucci, G Fiameni, M Negri… - arxiv preprint arxiv …, 2022 - arxiv.org
The primary goal of this FBK's systems submission to the IWSLT 2022 offline and
simultaneous speech translation tasks is to reduce model training costs without sacrificing …

How" Real" is Your Real-Time Simultaneous Speech-to-Text Translation System?

S Papi, P Polak, O Bojar, D Macháček - arxiv preprint arxiv:2412.18495, 2024 - arxiv.org
Simultaneous speech-to-text translation (SimulST) translates source-language speech into
target-language text concurrently with the speaker's speech, ensuring low latency for better …

Adapting offline speech translation models for streaming with future-aware distillation and inference

B Fu, M Liao, K Fan, Z Huang, B Chen, Y Chen… - arxiv preprint arxiv …, 2023 - arxiv.org
A popular approach to streaming speech translation is to employ a single offline model with
a wait-k policy to support different latency requirements, which is simpler than training …

Simul-Whisper: Attention-Guided Streaming Whisper with Truncation Detection

H Wang, G Hu, G Lin, WQ Zhang, J Li - arxiv preprint arxiv:2406.10052, 2024 - arxiv.org
As a robust and large-scale multilingual speech recognition model, Whisper has
demonstrated impressive results in many low-resource and out-of-distribution scenarios …

wav2vec-S: Adapting Pre-trained Speech Models for Streaming

B Fu, K Fan, M Liao, Y Chen, X Shi… - Findings of the …, 2024 - aclanthology.org
Pre-trained speech models, such as wav2vec 2.0, have significantly advanced speech-
related tasks, including speech recognition and translation. However, their applicability in …

Learning when to speak: Latency and quality trade-offs for simultaneous speech-to-speech translation with offline models

L Dugan, A Wadhawan, K Spence… - arxiv preprint arxiv …, 2023 - arxiv.org
Recent work in speech-to-speech translation (S2ST) has focused primarily on offline
settings, where the full input utterance is available before any output is given. This, however …