팔로우
Yihan Wu
제목
인용
인용
연도
PromptTTS: Controllable Text-to-Speech with Text Descriptions
Z Guo, Y Leng, Y Wu, S Zhao, X Tan
ICASSP 2023, 2022
1012022
Adaspeech 4: Adaptive text to speech in zero-shot scenarios
Y Wu, X Tan, B Li, L He, S Zhao, R Song, T Qin, TY Liu
InterSpeech 2022, 2022
722022
Resgrad: Residual denoising diffusion probabilistic models for text to speech
Z Chen, Y Wu, Y Leng, J Chen, H Liu, X Tan, Y Cui, K Wang, L He, S Zhao, ...
arXiv preprint arXiv:2212.14518, 2022
212022
VideoDubber: Machine Translation with Speech-Aware Length Control for Video Dubbing
Y Wu, J Guo, X Tan, C Zhang, B Li, R Song, L He, S Zhao, A Menezes, ...
AAAI 2023, 2022
162022
Self-supervised context-aware style representation for expressive speech synthesis
Y Wu, X Wang, S Zhang, L He, R Song, JY Nie
InterSpeech 2022, 2022
162022
The Interspeech 2024 challenge on speech processing using discrete units
X Chang, J Shi, J Tian, Y Wu, Y Tang, Y Wu, S Watanabe, Y Adi, X Chen, ...
arXiv preprint arXiv:2406.07725, 2024
132024
Espnet-codec: Comprehensive training and evaluation of neural codecs for audio, music, and speech
J Shi, J Tian, Y Wu, J Jung, JQ Yip, Y Masuyama, W Chen, Y Wu, Y Tang, ...
2024 IEEE Spoken Language Technology Workshop (SLT), 562-569, 2024
72024
Tiva: Time-aligned video-to-audio generation
X Wang, Y Wang, Y Wu, R Song, X Tan, Z Chen, H Xu, G Sui
Proceedings of the 32nd ACM International Conference on Multimedia, 573-582, 2024
52024
Yulan: An open-source large language model
Y Zhu, K Zhou, K Mao, W Chen, Y Sun, Z Chen, Q Cao, Y Wu, Y Chen, ...
arXiv preprint arXiv:2406.19853, 2024
22024
Speechcomposer: Unifying multiple speech tasks with prompt composition
Y Wu, S Maiti, Y Peng, W Zhang, C Li, Y Wang, X Wang, S Watanabe, ...
arXiv preprint arXiv:2401.18045, 2024
22024
Robust Audiovisual Speech Recognition Models with Mixture-of-Experts
Y Wu, Y Peng, Y Lu, X Chang, R Song, S Watanabe
2024 IEEE Spoken Language Technology Workshop (SLT), 43-48, 2024
12024
LoVA: Long-form Video-to-Audio Generation
X Cheng, X Wang, Y Wu, Y Wang, R Song
arXiv preprint arXiv:2409.15157, 2024
12024
Text-to-speech synthesis in the wild
J Jung, W Zhang, S Maiti, Y Wu, X Wang, JH Kim, Y Matsunaga, S Um, ...
arXiv preprint arXiv:2409.08711, 2024
12024
Understanding Human Preferences: Towards More Personalized Video to Text Generation
Y Wu, R Song, X Chen, H Jiang, Z Cao, J Yu
Proceedings of the ACM Web Conference 2024, 3952-3963, 2024
12024
SpoofCeleb: Speech Deepfake Detection and SASV In The Wild
J Jung, Y Wu, X Wang, JH Kim, S Maiti, Y Matsunaga, H Shim, J Tian, ...
IEEE Open Journal of Signal Processing, 2025
2025
Enhancing Audiovisual Speech Recognition through Bifocal Preference Optimization
Y Wu, Y Lu, Y Peng, X Wang, R Song, S Watanabe
arXiv preprint arXiv:2412.19005, 2024
2024
ComedicSpeech: Text To Speech For Stand-up Comedies in Low-Resource Scenarios
Y Wang, H Xiao, Y Wu, R Song
InterSpeech 2023, 2023
2023
현재 시스템이 작동되지 않습니다. 나중에 다시 시도해 주세요.
학술자료 1–17