PromptTTS: Controllable Text-to-Speech with Text Descriptions Z Guo, Y Leng, Y Wu, S Zhao, X Tan ICASSP 2023, 2022 | 101 | 2022 |
Adaspeech 4: Adaptive text to speech in zero-shot scenarios Y Wu, X Tan, B Li, L He, S Zhao, R Song, T Qin, TY Liu InterSpeech 2022, 2022 | 72 | 2022 |
Resgrad: Residual denoising diffusion probabilistic models for text to speech Z Chen, Y Wu, Y Leng, J Chen, H Liu, X Tan, Y Cui, K Wang, L He, S Zhao, ... arXiv preprint arXiv:2212.14518, 2022 | 21 | 2022 |
VideoDubber: Machine Translation with Speech-Aware Length Control for Video Dubbing Y Wu, J Guo, X Tan, C Zhang, B Li, R Song, L He, S Zhao, A Menezes, ... AAAI 2023, 2022 | 16 | 2022 |
Self-supervised context-aware style representation for expressive speech synthesis Y Wu, X Wang, S Zhang, L He, R Song, JY Nie InterSpeech 2022, 2022 | 16 | 2022 |
The Interspeech 2024 challenge on speech processing using discrete units X Chang, J Shi, J Tian, Y Wu, Y Tang, Y Wu, S Watanabe, Y Adi, X Chen, ... arXiv preprint arXiv:2406.07725, 2024 | 13 | 2024 |
Espnet-codec: Comprehensive training and evaluation of neural codecs for audio, music, and speech J Shi, J Tian, Y Wu, J Jung, JQ Yip, Y Masuyama, W Chen, Y Wu, Y Tang, ... 2024 IEEE Spoken Language Technology Workshop (SLT), 562-569, 2024 | 7 | 2024 |
Tiva: Time-aligned video-to-audio generation X Wang, Y Wang, Y Wu, R Song, X Tan, Z Chen, H Xu, G Sui Proceedings of the 32nd ACM International Conference on Multimedia, 573-582, 2024 | 5 | 2024 |
Yulan: An open-source large language model Y Zhu, K Zhou, K Mao, W Chen, Y Sun, Z Chen, Q Cao, Y Wu, Y Chen, ... arXiv preprint arXiv:2406.19853, 2024 | 2 | 2024 |
Speechcomposer: Unifying multiple speech tasks with prompt composition Y Wu, S Maiti, Y Peng, W Zhang, C Li, Y Wang, X Wang, S Watanabe, ... arXiv preprint arXiv:2401.18045, 2024 | 2 | 2024 |
Robust Audiovisual Speech Recognition Models with Mixture-of-Experts Y Wu, Y Peng, Y Lu, X Chang, R Song, S Watanabe 2024 IEEE Spoken Language Technology Workshop (SLT), 43-48, 2024 | 1 | 2024 |
LoVA: Long-form Video-to-Audio Generation X Cheng, X Wang, Y Wu, Y Wang, R Song arXiv preprint arXiv:2409.15157, 2024 | 1 | 2024 |
Text-to-speech synthesis in the wild J Jung, W Zhang, S Maiti, Y Wu, X Wang, JH Kim, Y Matsunaga, S Um, ... arXiv preprint arXiv:2409.08711, 2024 | 1 | 2024 |
Understanding Human Preferences: Towards More Personalized Video to Text Generation Y Wu, R Song, X Chen, H Jiang, Z Cao, J Yu Proceedings of the ACM Web Conference 2024, 3952-3963, 2024 | 1 | 2024 |
SpoofCeleb: Speech Deepfake Detection and SASV In The Wild J Jung, Y Wu, X Wang, JH Kim, S Maiti, Y Matsunaga, H Shim, J Tian, ... IEEE Open Journal of Signal Processing, 2025 | | 2025 |
Enhancing Audiovisual Speech Recognition through Bifocal Preference Optimization Y Wu, Y Lu, Y Peng, X Wang, R Song, S Watanabe arXiv preprint arXiv:2412.19005, 2024 | | 2024 |
ComedicSpeech: Text To Speech For Stand-up Comedies in Low-Resource Scenarios Y Wang, H Xiao, Y Wu, R Song InterSpeech 2023, 2023 | | 2023 |