Подписаться
Sheng Zhao
Sheng Zhao
Подтвержден адрес электронной почты в домене microsoft.com
Название
Процитировано
Процитировано
Год
Fastspeech 2: Fast and high-quality end-to-end text to speech
Y Ren, C Hu, X Tan, T Qin, S Zhao, Z Zhao, TY Liu
arXiv preprint arXiv:2006.04558, 2020
16102020
Fastspeech: Fast, robust and controllable text to speech
Y Ren, Y Ruan, X Tan, T Qin, S Zhao, Z Zhao, TY Liu
Advances in neural information processing systems 32, 2019
12792019
Neural speech synthesis with transformer network
N Li, S Liu, Y Liu, S Zhao, M Liu
Proceedings of the AAAI conference on artificial intelligence 33 (01), 6706-6713, 2019
9182019
Neural codec language models are zero-shot text to speech synthesizers
C Wang, S Chen, Y Wu, Z Zhang, L Zhou, S Liu, Z Chen, Y Liu, H Wang, ...
arXiv preprint arXiv:2301.02111, 2023
6432023
Adaspeech: Adaptive text to speech for custom voice
M Chen, X Tan, B Li, Y Liu, T Qin, S Zhao, TY Liu
arXiv preprint arXiv:2103.00993, 2021
2292021
NaturalSpeech: End-to-End Text-to-Speech Synthesis With Human-Level Quality
X Tan, J Chen, H Liu, J Cong, C Zhang, Y Liu, X Wang, Y Leng, Y Yi, L He, ...
IEEE Transactions on Pattern Analysis and Machine Intelligence 46 (6), 4234-4245, 2024
2272024
Naturalspeech 2: Latent diffusion models are natural and zero-shot speech and singing synthesizers
K Shen, Z Ju, X Tan, Y Liu, Y Leng, L He, T Qin, S Zhao, J Bian
arXiv preprint arXiv:2304.09116, 2023
2262023
Speak foreign languages with your own voice: Cross-lingual neural codec language modeling
Z Zhang, L Zhou, C Wang, S Chen, Y Wu, S Liu, Z Chen, Y Liu, H Wang, ...
arXiv preprint arXiv:2303.03926, 2023
1642023
Hyper-structure recurrent neural networks for text-to-speech
P Zhao, M Leung, K Yao, B Yan, S Zhao, FA Alleva
US Patent 10,127,901, 2018
1632018
Naturalspeech 3: Zero-shot speech synthesis with factorized codec and diffusion models
Z Ju, Y Wang, K Shen, X Tan, D Xin, D Yang, Y Liu, Y Leng, K Song, ...
arXiv preprint arXiv:2403.03100, 2024
1432024
Almost unsupervised text to speech and automatic speech recognition
Y Ren, X Tan, T Qin, S Zhao, Z Zhao, TY Liu
International conference on machine learning, 5410-5419, 2019
1292019
Multispeech: Multi-speaker text to speech with transformer
M Chen, X Tan, Y Ren, J Xu, H Sun, S Zhao, T Qin, TY Liu
arXiv preprint arXiv:2006.04664, 2020
1252020
Close to human quality TTS with transformer
N Li, S Liu, Y Liu, S Zhao, M Liu, M Zhou
arXiv preprint arXiv:1809.08895 2, 2018
1222018
Developing RNN-T models surpassing high-performance hybrid models with customization capability
J Li, R Zhao, Z Meng, Y Liu, W Wei, S Parthasarathy, V Mazalov, Z Wang, ...
arXiv preprint arXiv:2007.15188, 2020
1182020
MBNet: MOS prediction for synthesized speech with mean-bias network
Y Leng, X Tan, S Zhao, F Soong, XY Li, T Qin
ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and …, 2021
1102021
Uniaudio: An audio foundation model toward universal audio generation
D Yang, J Tian, X Tan, R Huang, S Liu, X Chang, J Shi, S Zhao, J Bian, ...
arXiv preprint arXiv:2310.00704, 2023
1082023
Prompttts: Controllable text-to-speech with text descriptions
Z Guo, Y Leng, Y Wu, S Zhao, X Tan
ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and …, 2023
1022023
Lrspeech: Extremely low-resource speech synthesis and recognition
J Xu, X Tan, Y Ren, T Qin, J Li, S Zhao, TY Liu
Proceedings of the 26th ACM SIGKDD International Conference on Knowledge …, 2020
1012020
Dilated residual network with multi-head self-attention for speech emotion recognition
R Li, Z Wu, J Jia, S Zhao, H Meng
ICASSP 2019-2019 IEEE international conference on acoustics, speech and …, 2019
942019
Token-level ensemble distillation for grapheme-to-phoneme conversion
H Sun, X Tan, JW Gan, H Liu, S Zhao, T Qin, TY Liu
arXiv preprint arXiv:1904.03446, 2019
812019
В данный момент система не может выполнить эту операцию. Повторите попытку позднее.
Статьи 1–20