A review of deep learning techniques for speech processing

A Mehrish, N Majumder, R Bharadwaj, R Mihalcea… - Information …, 2023 - Elsevier
The field of speech processing has undergone a transformative shift with the advent of deep
learning. The use of multiple processing layers has enabled the creation of models capable …

Self-supervised speech representation learning: A review

A Mohamed, H Lee, L Borgholt… - IEEE Journal of …, 2022 - ieeexplore.ieee.org
Although supervised deep learning has revolutionized speech and audio processing, it has
necessitated the building of specialist models for individual tasks and application scenarios …

Data2vec: A general framework for self-supervised learning in speech, vision and language

A Baevski, WN Hsu, Q Xu, A Babu… - … on Machine Learning, 2022 - proceedings.mlr.press
While the general idea of self-supervised learning is identical across modalities, the actual
algorithms and objectives differ widely because they were developed with a single modality …

Flava: A foundational language and vision alignment model

A Singh, R Hu, V Goswami… - Proceedings of the …, 2022 - openaccess.thecvf.com
State-of-the-art vision and vision-and-language models rely on large-scale visio-linguistic
pretraining for obtaining good performance on a variety of downstream tasks. Generally …

Wavlm: Large-scale self-supervised pre-training for full stack speech processing

S Chen, C Wang, Z Chen, Y Wu, S Liu… - IEEE Journal of …, 2022 - ieeexplore.ieee.org
Self-supervised learning (SSL) achieves great success in speech recognition, while limited
exploration has been attempted for other speech processing tasks. As speech signal …

Superb: Speech processing universal performance benchmark

S Yang, PH Chi, YS Chuang, CIJ Lai… - arxiv preprint arxiv …, 2021 - arxiv.org
Self-supervised learning (SSL) has proven vital for advancing research in natural language
processing (NLP) and computer vision (CV). The paradigm pretrains a shared model on …

W2v-bert: Combining contrastive learning and masked language modeling for self-supervised speech pre-training

YA Chung, Y Zhang, W Han, CC Chiu… - 2021 IEEE Automatic …, 2021 - ieeexplore.ieee.org
Motivated by the success of masked language modeling (MLM) in pre-training natural
language processing models, we propose w2v-BERT that explores MLM for self-supervised …

Efficient self-supervised learning with contextualized target representations for vision, speech and language

A Baevski, A Babu, WN Hsu… - … Conference on Machine …, 2023 - proceedings.mlr.press
Current self-supervised learning algorithms are often modality-specific and require large
amounts of computational resources. To address these issues, we increase the training …

Transfer learning based physics-informed neural networks for solving inverse problems in engineering structures under different loading scenarios

C Xu, BT Cao, Y Yuan, G Meschke - Computer Methods in Applied …, 2023 - Elsevier
Recently, a class of machine learning methods called physics-informed neural networks
(PINNs) has been proposed and gained prevalence in solving various scientific computing …

A fine-tuned wav2vec 2.0/hubert benchmark for speech emotion recognition, speaker verification and spoken language understanding

Y Wang, A Boumadane, A Heba - arxiv preprint arxiv:2111.02735, 2021 - arxiv.org
Speech self-supervised models such as wav2vec 2.0 and HuBERT are making revolutionary
progress in Automatic Speech Recognition (ASR). However, they have not been totally …