Semisupervised Speech Data Extraction from Basque Parliament Sessions and Validation on Fully Bilingual Basque–Spanish ASR

M Penagarikano, A Varona, G Bordel… - Applied Sciences, 2023 - mdpi.com
In this paper, a semisupervised speech data extraction method is presented and applied to
create a new dataset designed for the development of fully bilingual Automatic Speech …

Unsupervised domain adaptation for speech recognition with unsupervised error correction

L Mai, J Carson-Berndsen - arxiv preprint arxiv:2209.12043, 2022 - arxiv.org
The transcription quality of automatic speech recognition (ASR) systems degrades
significantly when transcribing audios coming from unseen domains. We propose an …

Self-Taught Recognizer: Toward Unsupervised Adaptation for Speech Foundation Models

Y Hu, C Chen, CHH Yang, C Qin, PY Chen… - arxiv preprint arxiv …, 2024 - arxiv.org
We propose an unsupervised adaptation framework, Self-TAught Recognizer (STAR), which
leverages unlabeled data to enhance the robustness of automatic speech recognition (ASR) …

[PDF][PDF] Semisupervised training of a fully bilingual ASR system for Basque and Spanish

M Penagarikano, A Varona, G Bordel… - Proceedings of the …, 2022 - researchgate.net
Automatic speech recognition (ASR) of speech signals with code-switching (an abrupt
language change common in bilingual communities) typically requires spoken language …

Overcoming domain mismatch in low resource sequence-to-sequence ASR models using hybrid generated pseudotranscripts

CF Li, F Keith, W Hartmann, M Snover… - arxiv preprint arxiv …, 2021 - arxiv.org
Sequence-to-sequence (seq2seq) models are competitive with hybrid models for automatic
speech recognition (ASR) tasks when large amounts of training data are available …

Combining Unsupervised and Text Augmented Semi-Supervised Learning For Low Resourced Autoregressive Speech Recognition

CF Li, F Keith, W Hartmann… - ICASSP 2022-2022 IEEE …, 2022 - ieeexplore.ieee.org
Recent advances in unsupervised representation learning have demonstrated the impact of
pretraining on large amounts of read speech. We adapt these techniques for domain …

Training Autoregressive Speech Recognition Models with Limited in-domain Supervision

CF Li, F Keith, W Hartmann, M Snover - arxiv preprint arxiv:2210.15135, 2022 - arxiv.org
Advances in self-supervised learning have significantly reduced the amount of transcribed
audio required for training. However, the majority of work in this area is focused on read …

Domain Adaptation‐Based Self‐Supervised ASR Models for Low‐Resource Target Domain

L Ashok Kumar, D Karthika Renuka… - … and Translation for …, 2024 - Wiley Online Library
Domain adaptation is the concept of improving the performance of a model on a target
domain, by leveraging the knowledge gained from the source domain. Speech recognition …

Enhancing the Performance of NMT Models Using the Data-Based Domain Adaptation Technique for Patent Translation

M Ahmed - 2023 - ir.lib.uwo.ca
During today's age of unparalleled connectivity, language and data have become powerful
tools capable of enabling effective communication and cross-cultural collaborations. Neural …

[PDF][PDF] Speech Data Selection for Efficient ASR Fine-Tuning using Domain Classifier and Pseudo-Label Filtering

P Rangappa, J Zuluaga-Gomez, S Madikeri, A Carofilis… - publications.idiap.ch
In real-world speech data processing, the scarcity of annotated data and the abundance of
unlabelled speech data present a significant challenge. To address this, we propose an …