Maestro: Matched speech text representations through modality matching
We present Maestro, a self-supervised training method to unify representations learnt from
speech and text modalities. Self-supervised learning from speech signals aims to learn the …
speech and text modalities. Self-supervised learning from speech signals aims to learn the …
Dub: Discrete unit back-translation for speech translation
How can speech-to-text translation (ST) perform as well as machine translation (MT)? The
key point is to bridge the modality gap between speech and text so that useful MT …
key point is to bridge the modality gap between speech and text so that useful MT …
Leveraging large text corpora for end-to-end speech summarization
End-to-end speech summarization (E2E SSum) is a technique to directly generate summary
sentences from speech. Compared with the cascade approach, which combines automatic …
sentences from speech. Compared with the cascade approach, which combines automatic …
Generating data with text-to-speech and large-language models for conversational speech recognition
Currently, a common approach in many speech processing tasks is to leverage large scale
pre-trained models by fine-tuning them on in-domain data for a particular application. Yet …
pre-trained models by fine-tuning them on in-domain data for a particular application. Yet …
Text-only domain adaptation for end-to-end asr using integrated text-to-mel-spectrogram generator
We propose an end-to-end Automatic Speech Recognition (ASR) system that can be trained
on transcribed speech data, text-only data, or a mixture of both. The proposed model uses …
on transcribed speech data, text-only data, or a mixture of both. The proposed model uses …
On the effect of purely synthetic training data for different automatic speech recognition architectures
In this work we evaluate the utility of synthetic data for training automatic speech recognition
(ASR). We use the ASR training data to train a text-to-speech (TTS) system similar to …
(ASR). We use the ASR training data to train a text-to-speech (TTS) system similar to …
When whisper meets TTS: Domain adaptation using only synthetic speech data
JC Vásquez-Correa, H Arzelus… - … Conference on Text …, 2023 - Springer
Abstract Automatic Speech Recognition is among the most important areas of Artificial
Intelligence research today. One of the most notable advances in this area is the …
Intelligence research today. One of the most notable advances in this area is the …
Investigating phoneme similarity with artificially accented speech
M Masson, J Carson-Berndsen - Proceedings of the 20th …, 2023 - aclanthology.org
While the deep learning revolution has led to significant performance improvements in
speech recognition, accented speech remains a challenge. Current approaches to this …
speech recognition, accented speech remains a challenge. Current approaches to this …
Towards Selection of Text-to-speech Data to Augment ASR Training
This paper presents a method for selecting appropriate synthetic speech samples from a
given large text-to-speech (TTS) dataset as supplementary training data for an automatic …
given large text-to-speech (TTS) dataset as supplementary training data for an automatic …
Enhancing Automatic Speech Recognition: Effects of Semantic Audio Filtering on Models Performance
This paper presents a novel methodology for enhancing Automatic Speech Recognition
(ASR) performance by utilizing contrastive learning to filter synthetic audio data. We address …
(ASR) performance by utilizing contrastive learning to filter synthetic audio data. We address …