Self-supervised speech representation learning: A review
Although supervised deep learning has revolutionized speech and audio processing, it has
necessitated the building of specialist models for individual tasks and application scenarios …
necessitated the building of specialist models for individual tasks and application scenarios …
SeamlessM4T-Massively Multilingual & Multimodal Machine Translation
What does it take to create the Babel Fish, a tool that can help individuals translate speech
between any two languages? While recent breakthroughs in text-based models have …
between any two languages? While recent breakthroughs in text-based models have …
Whisper-at: Noise-robust automatic speech recognizers are also strong general audio event taggers
In this paper, we focus on Whisper, a recent automatic speech recognition model trained
with a massive 680k hour labeled speech corpus recorded in diverse conditions. We first …
with a massive 680k hour labeled speech corpus recorded in diverse conditions. We first …
A joint speech enhancement and self-supervised representation learning framework for noise-robust speech recognition
Though speech enhancement (SE) can be used to improve speech quality in noisy
environments, it may also cause distortions that degrade the performance of automatic …
environments, it may also cause distortions that degrade the performance of automatic …
How does pre-trained wav2vec 2.0 perform on domain-shifted asr? an extensive benchmark on air traffic control communications
Recent work on self-supervised pre-training focus on leveraging large-scale unlabeled
speech data to build robust end-to-end (E2E) acoustic models (AM) that can be later fine …
speech data to build robust end-to-end (E2E) acoustic models (AM) that can be later fine …
Robust data2vec: Noise-robust speech representation learning for asr by combining regression and improved contrastive learning
Self-supervised pre-training methods based on contrastive learning or regression tasks can
utilize more unlabeled data to improve the performance of automatic speech recognition …
utilize more unlabeled data to improve the performance of automatic speech recognition …
Improving distortion robustness of self-supervised speech processing tasks with domain adaptation
Speech distortions are a long-standing problem that degrades the performance of
supervisely trained speech processing models. It is high time that we enhance the …
supervisely trained speech processing models. It is high time that we enhance the …
Gradient remedy for multi-task learning in end-to-end noise-robust speech recognition
Speech enhancement (SE) is proved effective in reducing noise from noisy speech signals
for downstream automatic speech recognition (ASR), where multi-task learning strategy is …
for downstream automatic speech recognition (ASR), where multi-task learning strategy is …
Wav2code: Restore clean speech representations via codebook lookup for noise-robust asr
Automatic speech recognition (ASR) has gained remarkable successes thanks to recent
advances of deep learning, but it usually degrades significantly under real-world noisy …
advances of deep learning, but it usually degrades significantly under real-world noisy …
De'hubert: Disentangling noise in a self-supervised model for robust speech recognition
Existing self-supervised pre-trained speech models have offered an effective way to
leverage massive unannotated corpora to build good automatic speech recognition (ASR) …
leverage massive unannotated corpora to build good automatic speech recognition (ASR) …