Multi-speaker expressive speech synthesis via multiple factors decoupling X Zhu, Y Lei, K Song, Y Zhang, T Li, L Xie ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and …, 2023 | 20 | 2023 |
SELM: Speech enhancement using discrete tokens and language models Z Wang, X Zhu, Z Zhang, YJ Lv, N Jiang, G Zhao, L Xie ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and …, 2024 | 19 | 2024 |
Single-Codec: Single-Codebook Speech Codec towards High-Performance Speech Generation H Li, L Xue, H Guo, X Zhu, Y Lv, L Xie, Y Chen, H Yin, Z Li arXiv preprint arXiv:2406.07422, 2024 | 18 | 2024 |
Cross-speaker emotion transfer through information perturbation in emotional speech synthesis Y Lei, S Yang, X Zhu, L Xie, D Su IEEE Signal Processing Letters 29, 1948-1952, 2022 | 18 | 2022 |
Metts: Multilingual emotional text-to-speech by cross-speaker and cross-lingual emotion transfer X Zhu, Y Lei, T Li, Y Zhang, H Zhou, H Lu, L Xie IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2024 | 14 | 2024 |
Vec-tok speech: Speech vectorization and tokenization for neural speech generation X Zhu, Y Lv, Y Lei, T Li, W He, H Zhou, H Lu, L Xie arXiv preprint arXiv:2310.07246, 2023 | 11 | 2023 |
DiCLET-TTS: Diffusion model based cross-lingual emotion transfer for text-to-speech—A study between English and Mandarin T Li, C Hu, J Cong, X Zhu, J Li, Q Tian, Y Wang, L Xie IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2023 | 10 | 2023 |
Unistyle: Unified style modeling for speaking style captioning and stylistic speech synthesis X Zhu, W Tian, X Wang, L He, Y Xiao, X Wang, X Tan, S Zhao, L Xie Proceedings of the 32nd ACM International Conference on Multimedia, 7513-7522, 2024 | 5 | 2024 |
SponTTS: modeling and transferring spontaneous style for TTS H Li, X Zhu, L Xue, Y Song, Y Chen, L Xie ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and …, 2024 | 5 | 2024 |
Accent-VITS: accent transfer for end-to-end TTS L Ma, Y Zhang, X Zhu, Y Lei, Z Ning, P Zhu, L Xie National Conference on Man-Machine Speech Communication, 203-214, 2023 | 4 | 2023 |
Contrastive context-speech pretraining for expressive text-to-speech synthesis Y Xiao, X Wang, X Tan, L He, X Zhu, S Zhao, T Lee Proceedings of the 32nd ACM International Conference on Multimedia, 2099-2107, 2024 | 2 | 2024 |
Vec-Tok-VC+: Residual-enhanced Robust Zero-shot Voice Conversion with Progressive Constraints in a Dual-mode Training Strategy L Ma, X Zhu, Y Lv, Z Wang, Z Wang, W He, H Zhou, L Xie arXiv preprint arXiv:2406.09844, 2024 | 2 | 2024 |
HIGNN-TTS: Hierarchical Prosody Modeling With Graph Neural Networks for Expressive Long-Form TTS D Guo, X Zhu, L Xue, T Li, Y Lv, Y Jiang, L Xie 2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), 1-7, 2023 | 2 | 2023 |
Zero-Shot Emotion Transfer for Cross-Lingual Speech Synthesis Y Li, X Zhu, Y Lei, H Li, J Liu, D Xie, L Xie 2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), 1-8, 2023 | 2 | 2023 |
The NPU-HWC System for the ISCSLP 2024 Inspirational and Convincing Audio Generation Challenge D Guo, J Yao, X Zhu, K Xia, Z Guo, Z Zhang, Y Wang, J Liu, L Xie 2024 IEEE 14th International Symposium on Chinese Spoken Language Processing …, 2024 | 1 | 2024 |
U-Style: Cascading U-nets with Multi-level Speaker and Style Modeling for Zero-Shot Voice Cloning T Li, Z Wang, X Zhu, J Cong, Q Tian, Y Wang, L Xie IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2024 | 1 | 2024 |
Multi-Speaker Expressive Speech Synthesis via Semi-supervised Contrastive Learning X Zhu, Y Li, Y Lei, N Jiang, G Zhao, L Xie arXiv preprint arXiv:2310.17101, 2023 | 1 | 2023 |
CosyAudio: Improving Audio Generation with Confidence Scores and Synthetic Captions X Zhu, W Tian, X Wang, L He, X Wang, S Zhao, L Xie arXiv preprint arXiv:2501.16761, 2025 | | 2025 |
OSUM: Advancing Open Speech Understanding Models with Limited Resources in Academia X Geng, K Wei, Q Shao, S Liu, Z Lin, Z Zhao, G Li, W Tian, P Chen, Y Li, ... arXiv preprint arXiv:2501.13306, 2025 | | 2025 |
ZSVC: Zero-shot Style Voice Conversion with Disentangled Latent Diffusion Models and Adversarial Training X Zhu, L He, Y Xiao, X Wang, X Tan, S Zhao, L Xie arXiv preprint arXiv:2501.04416, 2025 | | 2025 |