A deep learning loss function based on the perceptual evaluation of the speech quality

JM Martin-Donas, AM Gomez… - IEEE Signal …, 2018 - ieeexplore.ieee.org
This letter proposes a perceptual metric for speech quality evaluation, which is suitable, as a
loss function, for training deep learning methods. This metric, derived from the perceptual …

A consolidated view of loss functions for supervised deep learning-based speech enhancement

S Braun, I Tashev - 2021 44th International Conference on …, 2021 - ieeexplore.ieee.org
Deep learning-based speech enhancement for real-time applications recently made large
advancements. Due to the lack of a tractable perceptual optimization target, many myths …

Conditional sound generation using neural discrete time-frequency representation learning

X Liu, T Iqbal, J Zhao, Q Huang… - 2021 IEEE 31st …, 2021 - ieeexplore.ieee.org
Deep generative models have recently achieved impressive performance in speech and
music synthesis. However, compared to the generation of those domain-specific sounds …

A comparative study of time and frequency domain approaches to deep learning based speech enhancement

SA Nossier, J Wall, M Moniri, C Glackin… - … Joint Conference on …, 2020 - ieeexplore.ieee.org
Deep learning has recently made a breakthrough in the speech enhancement process.
Some architectures are based on a time domain representation, while others operate in the …

Real-time monaural speech enhancement with short-time discrete cosine transform

Q Li, F Gao, H Guan, K Ma - arxiv preprint arxiv:2102.04629, 2021 - arxiv.org
Speech enhancement algorithms based on deep learning have been improved in terms of
speech intelligibility and perceptual quality greatly. Many methods focus on enhancing the …

Psychoacoustic calibration of loss functions for efficient end-to-end neural audio coding

K Zhen, MS Lee, J Sung, S Beack… - IEEE Signal Processing …, 2020 - ieeexplore.ieee.org
Conventional audio coding technologies commonly leverage human perception of sound, or
psychoacoustics, to reduce the bitrate while preserving the perceptual quality of the …

Deep noise suppression maximizing non-differentiable PESQ mediated by a non-intrusive PESQNet

Z Xu, M Strake, T Fingscheidt - IEEE/ACM Transactions on …, 2022 - ieeexplore.ieee.org
Speech enhancement employing deep neural networks (DNNs) for denoising is called deep
noise suppression (DNS). The DNS trained with mean squared error (MSE) losses cannot …

Using generalized Gaussian distributions to improve regression error modeling for deep learning-based speech enhancement

L Chai, J Du, QF Liu, CH Lee - IEEE/ACM Transactions on …, 2019 - ieeexplore.ieee.org
From a statistical perspective, the conventional minimum mean squared error (MMSE)
criterion can be considered as the maximum likelihood (ML) solution under an assumed …

Using separate losses for speech and noise in mask-based speech enhancement

Z Xu, S Elshamy, T Fingscheidt - ICASSP 2020-2020 IEEE …, 2020 - ieeexplore.ieee.org
Estimating time-frequency domain masks for speech enhancement using deep learning
approaches has recently become a popular field in research. In this paper, we propose a …

A perceptual weighting filter loss for DNN training in speech enhancement

Z Zhao, S Elshamy, T Fingscheidt - 2019 IEEE Workshop on …, 2019 - ieeexplore.ieee.org
Single-channel speech enhancement with deep neural networks (DNNs) has shown
promising performance and is thus intensively being studied. In this paper, instead of …