Reimagining speech: a sco** review of deep learning-based methods for non-parallel voice conversion

AR Bargum, S Serafin, C Erkut - Frontiers in signal processing, 2024 - frontiersin.org
Research on deep learning-powered voice conversion (VC) in speech-to-speech scenarios
are gaining increasing popularity. Although many of the works in the field of voice …

Reimagining speech: A sco** review of deep learning-powered voice conversion

AR Bargum, S Serafin, C Erkut - arxiv preprint arxiv:2311.08104, 2023 - arxiv.org
Research on deep learning-powered voice conversion (VC) in speech-to-speech scenarios
is getting increasingly popular. Although many of the works in the field of voice conversion …

Multi-speaker speech synthesis from electromyographic signals by soft speech unit prediction

K Scheck, T Schultz - ICASSP 2023-2023 IEEE International …, 2023 - ieeexplore.ieee.org
Electromyographic (EMG) signals of articulatory muscles reflect the speech production
process even if the user is speaking silently ie moving the articulators without producing …

Nonparallel emotional voice conversion for unseen speaker-emotion pairs using dual domain adversarial network & virtual domain pairing

N Shah, M Singh, N Takahashi… - ICASSP 2023-2023 IEEE …, 2023 - ieeexplore.ieee.org
Primary goal of an emotional voice conversion (EVC) system is to convert the emotion of a
given speech signal from one style to another style without modifying the linguistic content of …

Privacy Versus Emotion Preservation Trade-Offs in Emotion-Preserving Speaker Anonymization

Z Cai, HL **nyuan, A Garg… - 2024 IEEE Spoken …, 2024 - ieeexplore.ieee.org
Advances in speech technology now allow unprecedented access to personally identifiable
information through speech. To protect such information, the differential privacy field has …

CCSRD: Content-centric speech representation disentanglement learning for end-to-end speech translation

X Zhao, H Sun, Y Lei, S Zhu… - Findings of the Association …, 2023 - aclanthology.org
Deep neural networks have demonstrated their capacity in extracting features from speech
inputs. However, these features may include non-linguistic speech factors such as timbre …

Using joint training speaker encoder with consistency loss to achieve cross-lingual voice conversion and expressive voice conversion

H Guo, C Liu, CT Ishi, H Ishiguro - 2023 IEEE Automatic Speech …, 2023 - ieeexplore.ieee.org
Voice conversion systems have made significant advancements in terms of naturalness and
similarity in common voice conversion tasks. However, their performance in more complex …

Fine-grained quantitative emotion editing for speech generation

S Inoue, K Zhou, S Wang, H Li - 2024 Asia Pacific Signal and …, 2024 - ieeexplore.ieee.org
It remains a significant challenge how to quantitatively control the expressiveness of speech
emotion in speech generation. In this work, we propose an approach for quantitative …

[HTML][HTML] Scalability and diversity of StarGANv2-VC in Arabic emotional voice conversion: Overcoming data limitations and enhancing performance

AH Meftah, YA Alotaibi, SA Selouani - Journal of King Saud University …, 2024 - Elsevier
Abstract Emotional Voice Conversion (EVC) for under-resourced languages like Arabic
faces challenges due to limited emotional speech data. This study explored strategies to …

Msm-vc: High-fidelity source style transfer for non-parallel voice conversion by multi-scale style modeling

Z Wang, X Wang, Q **e, T Li, L **e… - … /ACM Transactions on …, 2023 - ieeexplore.ieee.org
In addition to conveying the linguistic content from source speech to converted speech,
maintaining the speaking style of source speech also plays an important role in the voice …