A review of deep learning techniques for speech processing
The field of speech processing has undergone a transformative shift with the advent of deep
learning. The use of multiple processing layers has enabled the creation of models capable …
learning. The use of multiple processing layers has enabled the creation of models capable …
Self-supervised speech representation learning: A review
Although supervised deep learning has revolutionized speech and audio processing, it has
necessitated the building of specialist models for individual tasks and application scenarios …
necessitated the building of specialist models for individual tasks and application scenarios …
Data2vec: A general framework for self-supervised learning in speech, vision and language
While the general idea of self-supervised learning is identical across modalities, the actual
algorithms and objectives differ widely because they were developed with a single modality …
algorithms and objectives differ widely because they were developed with a single modality …
Flava: A foundational language and vision alignment model
State-of-the-art vision and vision-and-language models rely on large-scale visio-linguistic
pretraining for obtaining good performance on a variety of downstream tasks. Generally …
pretraining for obtaining good performance on a variety of downstream tasks. Generally …
Wavlm: Large-scale self-supervised pre-training for full stack speech processing
Self-supervised learning (SSL) achieves great success in speech recognition, while limited
exploration has been attempted for other speech processing tasks. As speech signal …
exploration has been attempted for other speech processing tasks. As speech signal …
Superb: Speech processing universal performance benchmark
Self-supervised learning (SSL) has proven vital for advancing research in natural language
processing (NLP) and computer vision (CV). The paradigm pretrains a shared model on …
processing (NLP) and computer vision (CV). The paradigm pretrains a shared model on …
W2v-bert: Combining contrastive learning and masked language modeling for self-supervised speech pre-training
Motivated by the success of masked language modeling (MLM) in pre-training natural
language processing models, we propose w2v-BERT that explores MLM for self-supervised …
language processing models, we propose w2v-BERT that explores MLM for self-supervised …
Efficient self-supervised learning with contextualized target representations for vision, speech and language
Current self-supervised learning algorithms are often modality-specific and require large
amounts of computational resources. To address these issues, we increase the training …
amounts of computational resources. To address these issues, we increase the training …
Transfer learning based physics-informed neural networks for solving inverse problems in engineering structures under different loading scenarios
Recently, a class of machine learning methods called physics-informed neural networks
(PINNs) has been proposed and gained prevalence in solving various scientific computing …
(PINNs) has been proposed and gained prevalence in solving various scientific computing …
A fine-tuned wav2vec 2.0/hubert benchmark for speech emotion recognition, speaker verification and spoken language understanding
Speech self-supervised models such as wav2vec 2.0 and HuBERT are making revolutionary
progress in Automatic Speech Recognition (ASR). However, they have not been totally …
progress in Automatic Speech Recognition (ASR). However, they have not been totally …