Deep learning for audio signal processing

H Purwins, B Li, T Virtanen, J Schlüter… - IEEE Journal of …, 2019 - ieeexplore.ieee.org
Given the recent surge in developments of deep learning, this paper provides a review of the
state-of-the-art deep learning techniques for audio signal processing. Speech, music, and …

Sequence-to-sequence models can directly translate foreign speech

RJ Weiss, J Chorowski, N Jaitly, Y Wu… - arxiv preprint arxiv …, 2017 - arxiv.org
We present a recurrent encoder-decoder deep neural network architecture that directly
translates speech in one language into text in another. The model does not explicitly …

Multilingual speech translation with efficient finetuning of pretrained models

X Li, C Wang, Y Tang, C Tran, Y Tang, J Pino… - arxiv preprint arxiv …, 2020 - arxiv.org
We present a simple yet effective approach to build multilingual speech-to-text (ST)
translation by efficient transfer learning from pretrained speech encoder and text decoder …

End-to-end speech-to-text translation: A survey

N Sethiya, CK Maurya - Computer Speech & Language, 2024 - Elsevier
Abstract Speech-to-Text (ST) translation pertains to the task of converting speech signals in
one language to text in another language. It finds its application in various domains, such as …

Tied multitask learning for neural speech translation

A Anastasopoulos, D Chiang - arxiv preprint arxiv:1802.06655, 2018 - arxiv.org
We explore multitask models for neural translation of speech, augmenting them in order to
reflect two intuitive notions. First, we introduce a model where the second task decoder …

Speech translation and the end-to-end promise: Taking stock of where we are

M Sperber, M Paulik - arxiv preprint arxiv:2004.06358, 2020 - arxiv.org
Over its three decade history, speech translation has experienced several shifts in its
primary research themes; moving from loosely coupled cascades of speech recognition and …

Multimodal machine translation through visuals and speech

U Sulubacak, O Caglayan, SA Grönroos, A Rouhe… - Machine …, 2020 - Springer
Multimodal machine translation involves drawing information from more than one modality,
based on the assumption that the additional modalities will contain useful alternative views …

Covost: A diverse multilingual speech-to-text translation corpus

C Wang, J Pino, A Wu, J Gu - arxiv preprint arxiv:2002.01320, 2020 - arxiv.org
Spoken language translation has recently witnessed a resurgence in popularity, thanks to
the development of end-to-end models and the creation of new corpora, such as Augmented …

A comparative study on end-to-end speech to text translation

P Bahar, T Bieschke, H Ney - 2019 IEEE Automatic Speech …, 2019 - ieeexplore.ieee.org
Recent advances in deep learning show that end-to-end speech to text translation model is
a promising approach to direct the speech translation field. In this work, we provide an …

A brief overview of unsupervised neural speech representation learning

L Borgholt, JD Havtorn, J Edin, L Maaløe… - arxiv preprint arxiv …, 2022 - arxiv.org
Unsupervised representation learning for speech processing has matured greatly in the last
few years. Work in computer vision and natural language processing has paved the way, but …