A metaverse: Taxonomy, components, applications, and open challenges
SM Park, YG Kim - IEEE access, 2022 - ieeexplore.ieee.org
Unlike previous studies on the Metaverse based on Second Life, the current Metaverse is
based on the social value of Generation Z that online and offline selves are not different …
based on the social value of Generation Z that online and offline selves are not different …
Transformer: A general framework from machine translation to others
Abstract Machine translation is an important and challenging task that aims at automatically
translating natural language sentences from one language into another. Recently …
translating natural language sentences from one language into another. Recently …
Speecht5: Unified-modal encoder-decoder pre-training for spoken language processing
Motivated by the success of T5 (Text-To-Text Transfer Transformer) in pre-trained natural
language processing models, we propose a unified-modal SpeechT5 framework that …
language processing models, we propose a unified-modal SpeechT5 framework that …
SeamlessM4T-Massively Multilingual & Multimodal Machine Translation
What does it take to create the Babel Fish, a tool that can help individuals translate speech
between any two languages? While recent breakthroughs in text-based models have …
between any two languages? While recent breakthroughs in text-based models have …
STEMM: Self-learning with speech-text manifold mixup for speech translation
How to learn a better speech representation for end-to-end speech-to-text translation (ST)
with limited labeled data? Existing techniques often attempt to transfer powerful machine …
with limited labeled data? Existing techniques often attempt to transfer powerful machine …
Recent advances in direct speech-to-text translation
Recently, speech-to-text translation has attracted more and more attention and many studies
have emerged rapidly. In this paper, we present a comprehensive survey on direct speech …
have emerged rapidly. In this paper, we present a comprehensive survey on direct speech …
Unified speech-text pre-training for speech translation and recognition
We describe a method to jointly pre-train speech and text in an encoder-decoder modeling
framework for speech translation and recognition. The proposed method incorporates four …
framework for speech translation and recognition. The proposed method incorporates four …
Multilingual speech translation with efficient finetuning of pretrained models
We present a simple yet effective approach to build multilingual speech-to-text (ST)
translation by efficient transfer learning from pretrained speech encoder and text decoder …
translation by efficient transfer learning from pretrained speech encoder and text decoder …
Cross-modal contrastive learning for speech translation
How can we learn unified representations for spoken utterances and their written text?
Learning similar representations for semantically similar speech and text is important for …
Learning similar representations for semantically similar speech and text is important for …
Unity: Two-pass direct speech-to-speech translation with discrete units
Direct speech-to-speech translation (S2ST), in which all components can be optimized
jointly, is advantageous over cascaded approaches to achieve fast inference with a …
jointly, is advantageous over cascaded approaches to achieve fast inference with a …