Transformer: A general framework from machine translation to others

Y Zhao, J Zhang, C Zong - Machine Intelligence Research, 2023 - Springer
Abstract Machine translation is an important and challenging task that aims at automatically
translating natural language sentences from one language into another. Recently …

End-to-end speech recognition: A survey

R Prabhavalkar, T Hori, TN Sainath… - … on Audio, Speech …, 2023 - ieeexplore.ieee.org
In the last decade of automatic speech recognition (ASR) research, the introduction of deep
learning has brought considerable reductions in word error rate of more than 50% relative …

One-peace: Exploring one general representation model toward unlimited modalities

P Wang, S Wang, J Lin, S Bai, X Zhou, J Zhou… - arxiv preprint arxiv …, 2023 - arxiv.org
In this work, we explore a scalable way for building a general representation model toward
unlimited modalities. We release ONE-PEACE, a highly extensible model with 4B …

Recent advances in direct speech-to-text translation

C Xu, R Ye, Q Dong, C Zhao, T Ko, M Wang… - arxiv preprint arxiv …, 2023 - arxiv.org
Recently, speech-to-text translation has attracted more and more attention and many studies
have emerged rapidly. In this paper, we present a comprehensive survey on direct speech …

Cross-modal contrastive learning for speech translation

R Ye, M Wang, L Li - arxiv preprint arxiv:2205.02444, 2022 - arxiv.org
How can we learn unified representations for spoken utterances and their written text?
Learning similar representations for semantically similar speech and text is important for …

SLTUNET: A simple unified model for sign language translation

B Zhang, M Müller, R Sennrich - arxiv preprint arxiv:2305.01778, 2023 - arxiv.org
Despite recent successes with neural models for sign language translation (SLT), translation
quality still lags behind spoken languages because of the data scarcity and modality gap …

Speechut: Bridging speech and text with hidden-unit for encoder-decoder based speech-text pre-training

Z Zhang, L Zhou, J Ao, S Liu, L Dai, J Li… - arxiv preprint arxiv …, 2022 - arxiv.org
The rapid development of single-modal pre-training has prompted researchers to pay more
attention to cross-modal pre-training methods. In this paper, we propose a unified-modal …

Unity: Two-pass direct speech-to-speech translation with discrete units

H Inaguma, S Popuri, I Kulikov, PJ Chen… - arxiv preprint arxiv …, 2022 - arxiv.org
Direct speech-to-speech translation (S2ST), in which all components can be optimized
jointly, is advantageous over cascaded approaches to achieve fast inference with a …

Speechlm: Enhanced speech pre-training with unpaired textual data

Z Zhang, S Chen, L Zhou, Y Wu, S Ren… - … on Audio, Speech …, 2024 - ieeexplore.ieee.org
How to boost speech pre-training with textual data is an unsolved problem due to the fact
that speech and text are very different modalities with distinct characteristics. In this paper …

Pre-training for speech translation: CTC meets optimal transport

PH Le, H Gong, C Wang, J Pino… - International …, 2023 - proceedings.mlr.press
The gap between speech and text modalities is a major challenge in speech-to-text
translation (ST). Different methods have been proposed to reduce this gap, but most of them …