Direct speech-to-speech translation with a sequence-to-sequence model
We present an attention-based sequence-to-sequence neural network which can directly
translate speech from one language into speech in another language, without relying on an …
translate speech from one language into speech in another language, without relying on an …
A generative model for raw audio using transformer architectures
P Verma, C Chafe - … Conference on Digital Audio Effects (DAFx …, 2021 - ieeexplore.ieee.org
This paper proposes a novel way of doing audio synthesis at the waveform level using
Transformer architectures. We propose a deep neural network for generating waveforms …
Transformer architectures. We propose a deep neural network for generating waveforms …
Tibetan–Chinese speech-to-speech translation based on discrete units
Z Gong, X Xu, Y Zhao - Scientific Reports, 2025 - nature.com
Speech-to-speech translation (S2ST) has evolved from cascade systems which integrate
Automatic Speech Recognition (ASR), Machine Translation (MT), and Text-to-Speech (TTS) …
Automatic Speech Recognition (ASR), Machine Translation (MT), and Text-to-Speech (TTS) …
Neural Architectures Learning Fourier Transforms, Signal Processing and Much More....
P Verma - arxiv preprint arxiv:2308.10388, 2023 - arxiv.org
This report will explore and answer fundamental questions about taking Fourier Transforms
and tying it with recent advances in AI and neural architecture. One interpretation of the …
and tying it with recent advances in AI and neural architecture. One interpretation of the …
Kazakh-Uzbek Speech Cascade Machine Translation on Complete Set of Endings
Studies of speech-to-speech machine translation for Turkic languages are practically absent
due to the difficulties of creating parallel speech corpora for training neural models …
due to the difficulties of creating parallel speech corpora for training neural models …
Multi-Task Self-Supervised Learning Based Tibetan-Chinese Speech-to-Speech Translation
R Liu, Y Zhao, X Xu - 2023 International Conference on Asian …, 2023 - ieeexplore.ieee.org
Speech-to-speech translation tasks are commonly tackled by using a three-level cascade
system which comprises of speech recognition, machine translation, and speech synthesis …
system which comprises of speech recognition, machine translation, and speech synthesis …
Learning to model aspects of hearing perception using neural loss functions
P Verma, J Berger - arxiv preprint arxiv:1912.05683, 2019 - arxiv.org
We present a framework to model the perceived quality of audio signals by combining
convolutional architectures, with ideas from classical signal processing, and describe an …
convolutional architectures, with ideas from classical signal processing, and describe an …