Findings of the IWSLT 2022 Evaluation Campaign.

A Anastasopoulos, L Barrault, L Bentivogli… - Proceedings of the 19th …, 2022 - cris.fbk.eu
The evaluation campaign of the 19th International Conference on Spoken Language
Translation featured eight shared tasks:(i) Simultaneous speech translation,(ii) Offline …

Unity: Two-pass direct speech-to-speech translation with discrete units

H Inaguma, S Popuri, I Kulikov, PJ Chen… - arxiv preprint arxiv …, 2022 - arxiv.org
Direct speech-to-speech translation (S2ST), in which all components can be optimized
jointly, is advantageous over cascaded approaches to achieve fast inference with a …

Epsilon sampling rocks: Investigating sampling strategies for minimum Bayes risk decoding for machine translation

M Freitag, B Ghorbani, P Fernandes - arxiv preprint arxiv:2305.09860, 2023 - arxiv.org
Recent advances in machine translation (MT) have shown that Minimum Bayes Risk (MBR)
decoding can be a powerful alternative to beam search decoding, especially when …

CTC alignments improve autoregressive translation

B Yan, S Dalmia, Y Higuchi, G Neubig, F Metze… - arxiv preprint arxiv …, 2022 - arxiv.org
Connectionist Temporal Classification (CTC) is a widely used approach for automatic
speech recognition (ASR) that performs conditionally independent monotonic alignment …

It's MBR all the way down: Modern generation techniques through the lens of minimum Bayes risk

A Bertsch, A **e, G Neubig, MR Gormley - arxiv preprint arxiv:2310.01387, 2023 - arxiv.org
Minimum Bayes Risk (MBR) decoding is a method for choosing the outputs of a machine
learning system based not on the output with the highest probability, but the output with the …

ESPnet-ST-v2: Multipurpose spoken language translation toolkit

B Yan, J Shi, Y Tang, H Inaguma, Y Peng… - arxiv preprint arxiv …, 2023 - arxiv.org
ESPnet-ST-v2 is a revamp of the open-source ESPnet-ST toolkit necessitated by the
broadening interests of the spoken language translation community. ESPnet-ST-v2 supports …

Align, write, re-order: Explainable end-to-end speech translation via operation sequence generation

M Omachi, B Yan, S Dalmia, Y Fujita… - ICASSP 2023-2023 …, 2023 - ieeexplore.ieee.org
The black-box nature of end-to-end speech-to-text translation (E2E ST) makes it difficult to
understand how source language inputs are being mapped to the target language. To solve …

Espnet-onnx: Bridging a gap between research and production

M Someki, Y Higuchi, T Hayashi… - 2022 Asia-Pacific …, 2022 - ieeexplore.ieee.org
In the field of deep learning, researchers often focus on inventing novel neural network
models and improving benchmarks. In contrast, application developers are interested in …

Strategies for improving low resource speech to text translation relying on pre-trained asr models

S Kesiraju, M Sarvas, T Pavlicek, C Macaire… - arxiv preprint arxiv …, 2023 - arxiv.org
This paper presents techniques and findings for improving the performance of low-resource
speech to text translation (ST). We conducted experiments on both simulated and real-low …

ALADAN at IWSLT24 Low-resource Arabic Dialectal Speech Translation Task

WB Kheder, J Jon, A Beyer, A Messaoudi… - Proceedings of the …, 2024 - aclanthology.org
This paper presents ALADAN's approach to the IWSLT 2024 Dialectal and Low-resource
shared task, focusing on Levantine Arabic (apc) and Tunisian Arabic (aeb) to English …