An overview of voice conversion and its challenges: From statistical modeling to deep learning

B Sisman, J Yamagishi, S King… - IEEE/ACM Transactions …, 2020 - ieeexplore.ieee.org
Speaker identity is one of the important characteristics of human speech. In voice
conversion, we change the speaker identity from one to another, while kee** the linguistic …

Expressive TTS training with frame and style reconstruction loss

R Liu, B Sisman, G Gao, H Li - IEEE/ACM Transactions on …, 2021 - ieeexplore.ieee.org
We propose a novel training strategy for Tacotron-based text-to-speech (TTS) system that
improves the speech styling at utterance level. One of the key challenges in prosody …

Deep learning approaches in topics of singing information processing

C Gupta, H Li, M Goto - IEEE/ACM Transactions on Audio …, 2022 - ieeexplore.ieee.org
Singing, the vocal productionof musical tones, is one of the most important elements of
music. Addressing the needs of real-world applications, the study of technologies related to …

Elucidate gender fairness in singing voice transcription

X Gu, W Zeng, Y Wang - Proceedings of the 31st ACM International …, 2023 - dl.acm.org
It is widely known that males and females typically possess different sound characteristics
when singing, such as timbre and pitch, but it has never been explored whether these …

SLIONS: A karaoke application to enhance foreign language learning

D Murad, R Wang, D Turnbull, Y Wang - Proceedings of the 26th ACM …, 2018 - dl.acm.org
Singing songs can be an engaging and effective activity when learning a foreign language.
In this paper, we describe a multi-language karaoke application called SLIONS: Singing and …

Analysis and modeling of timbre perception features in musical sounds

W Jiang, J Liu, X Zhang, S Wang, Y Jiang - Applied Sciences, 2020 - mdpi.com
A novel technique is proposed for the analysis and modeling of timbre perception features,
including a new terminology system for evaluating timbre in musical instruments. This …

[PDF][PDF] Automatic Pronunciation Evaluation of Singing.

C Gupta, H Li, Y Wang - Interspeech, 2018 - isca-archive.org
In this work, we develop a strategy to automatically evaluate pronunciation of singing. We
apply singing-adapted automatic speech recognizer (ASR) in a two-stage approach for …

Perception-aware attack: Creating adversarial music via reverse-engineering human perception

R Duan, Z Qu, S Zhao, L Ding, Y Liu, Z Lu - Proceedings of the 2022 …, 2022 - dl.acm.org
Previous adversarial audio attacks have mainly focused on ensuring the effectiveness of
attacking an audio signal classifier via creating a small noise-like perturbation on the …

[PDF][PDF] Wavelet Analysis of Speaker Dependent and Independent Prosody for Voice Conversion.

B Sisman, H Li - Interspeech, 2018 - isca-archive.org
Thus far, voice conversion studies are mainly focused on the conversion of spectrum.
However, speaker identity is also characterized by its prosody features, such as fundamental …

Speech-to-singing voice conversion: The challenges and strategies for improving vocal conversion processes

K Vijayan, H Li, T Toda - IEEE Signal Processing Magazine, 2018 - ieeexplore.ieee.org
Speech-to-singing (STS) conversion is the task of converting the read lyrics of a song,
spoken in natural manner, to proper singing. The most important aspect of the task is to …