A deep learning loss function based on the perceptual evaluation of the speech quality
This letter proposes a perceptual metric for speech quality evaluation, which is suitable, as a
loss function, for training deep learning methods. This metric, derived from the perceptual …
loss function, for training deep learning methods. This metric, derived from the perceptual …
A consolidated view of loss functions for supervised deep learning-based speech enhancement
Deep learning-based speech enhancement for real-time applications recently made large
advancements. Due to the lack of a tractable perceptual optimization target, many myths …
advancements. Due to the lack of a tractable perceptual optimization target, many myths …
Conditional sound generation using neural discrete time-frequency representation learning
Deep generative models have recently achieved impressive performance in speech and
music synthesis. However, compared to the generation of those domain-specific sounds …
music synthesis. However, compared to the generation of those domain-specific sounds …
A comparative study of time and frequency domain approaches to deep learning based speech enhancement
Deep learning has recently made a breakthrough in the speech enhancement process.
Some architectures are based on a time domain representation, while others operate in the …
Some architectures are based on a time domain representation, while others operate in the …
Real-time monaural speech enhancement with short-time discrete cosine transform
Q Li, F Gao, H Guan, K Ma - arxiv preprint arxiv:2102.04629, 2021 - arxiv.org
Speech enhancement algorithms based on deep learning have been improved in terms of
speech intelligibility and perceptual quality greatly. Many methods focus on enhancing the …
speech intelligibility and perceptual quality greatly. Many methods focus on enhancing the …
Psychoacoustic calibration of loss functions for efficient end-to-end neural audio coding
Conventional audio coding technologies commonly leverage human perception of sound, or
psychoacoustics, to reduce the bitrate while preserving the perceptual quality of the …
psychoacoustics, to reduce the bitrate while preserving the perceptual quality of the …
Deep noise suppression maximizing non-differentiable PESQ mediated by a non-intrusive PESQNet
Speech enhancement employing deep neural networks (DNNs) for denoising is called deep
noise suppression (DNS). The DNS trained with mean squared error (MSE) losses cannot …
noise suppression (DNS). The DNS trained with mean squared error (MSE) losses cannot …
Using generalized Gaussian distributions to improve regression error modeling for deep learning-based speech enhancement
From a statistical perspective, the conventional minimum mean squared error (MMSE)
criterion can be considered as the maximum likelihood (ML) solution under an assumed …
criterion can be considered as the maximum likelihood (ML) solution under an assumed …
Using separate losses for speech and noise in mask-based speech enhancement
Estimating time-frequency domain masks for speech enhancement using deep learning
approaches has recently become a popular field in research. In this paper, we propose a …
approaches has recently become a popular field in research. In this paper, we propose a …
A perceptual weighting filter loss for DNN training in speech enhancement
Single-channel speech enhancement with deep neural networks (DNNs) has shown
promising performance and is thus intensively being studied. In this paper, instead of …
promising performance and is thus intensively being studied. In this paper, instead of …