Deep learning for audio signal processing
Given the recent surge in developments of deep learning, this paper provides a review of the
state-of-the-art deep learning techniques for audio signal processing. Speech, music, and …
state-of-the-art deep learning techniques for audio signal processing. Speech, music, and …
Sequence-to-sequence models can directly translate foreign speech
We present a recurrent encoder-decoder deep neural network architecture that directly
translates speech in one language into text in another. The model does not explicitly …
translates speech in one language into text in another. The model does not explicitly …
Multilingual speech translation with efficient finetuning of pretrained models
We present a simple yet effective approach to build multilingual speech-to-text (ST)
translation by efficient transfer learning from pretrained speech encoder and text decoder …
translation by efficient transfer learning from pretrained speech encoder and text decoder …
End-to-end speech-to-text translation: A survey
N Sethiya, CK Maurya - Computer Speech & Language, 2024 - Elsevier
Abstract Speech-to-Text (ST) translation pertains to the task of converting speech signals in
one language to text in another language. It finds its application in various domains, such as …
one language to text in another language. It finds its application in various domains, such as …
Tied multitask learning for neural speech translation
We explore multitask models for neural translation of speech, augmenting them in order to
reflect two intuitive notions. First, we introduce a model where the second task decoder …
reflect two intuitive notions. First, we introduce a model where the second task decoder …
Speech translation and the end-to-end promise: Taking stock of where we are
M Sperber, M Paulik - arxiv preprint arxiv:2004.06358, 2020 - arxiv.org
Over its three decade history, speech translation has experienced several shifts in its
primary research themes; moving from loosely coupled cascades of speech recognition and …
primary research themes; moving from loosely coupled cascades of speech recognition and …
Multimodal machine translation through visuals and speech
Multimodal machine translation involves drawing information from more than one modality,
based on the assumption that the additional modalities will contain useful alternative views …
based on the assumption that the additional modalities will contain useful alternative views …
Covost: A diverse multilingual speech-to-text translation corpus
Spoken language translation has recently witnessed a resurgence in popularity, thanks to
the development of end-to-end models and the creation of new corpora, such as Augmented …
the development of end-to-end models and the creation of new corpora, such as Augmented …
A comparative study on end-to-end speech to text translation
Recent advances in deep learning show that end-to-end speech to text translation model is
a promising approach to direct the speech translation field. In this work, we provide an …
a promising approach to direct the speech translation field. In this work, we provide an …
A brief overview of unsupervised neural speech representation learning
Unsupervised representation learning for speech processing has matured greatly in the last
few years. Work in computer vision and natural language processing has paved the way, but …
few years. Work in computer vision and natural language processing has paved the way, but …