Прати
Rongjie Huang
Rongjie Huang
Верификована је имејл адреса на zju.edu.cn - Почетна страница
Наслов
Навело
Навело
Година
Make-An-Audio: Text-to-audio generation with prompt-enhanced diffusion models
R Huang, J Huang, D Yang, Y Ren, L Liu, M Li, Z Ye, J Liu, X Yin, Z Zhao
ICML 2023, 2023
3352023
ProDiff: Progressive Fast Diffusion Model For High-Quality Text-to-Speech
R Huang, Z Zhao, H Liu, J Liu, C Cui, Y Ren
ACM MM 2022, 2022
1872022
Audiogpt: Understanding and generating speech, music, sound, and talking head
R Huang, M Li, D Yang, J Shi, X Chang, Z Ye, Y Wu, Z Hong, J Huang, ...
Proceedings of the AAAI Conference on Artificial Intelligence 38 (21), 23802 …, 2024
1822024
FastDiff: A Fast Conditional Diffusion Model for High-Quality Speech Synthesis
R Huang, MWY Lam, J Wang, D Su, D Yu, Y Ren, Z Zhao
IJCAI 2022, 2022
1662022
Bilateral denoising diffusion models
MWY Lam, J Wang, R Huang, D Su, D Yu
arXiv preprint arXiv:2108.11514, 2021
142*2021
Hifi-codec: Group-residual vector quantization for high fidelity audio codec
D Yang, S Liu, R Huang, J Tian, C Weng, Y Zou
arXiv preprint arXiv:2305.02765, 2023
1152023
Uniaudio: An audio foundation model toward universal audio generation
D Yang, J Tian, X Tan, R Huang, S Liu, X Chang, J Shi, S Zhao, J Bian, ...
arXiv preprint arXiv:2310.00704, 2023
1132023
Multi-singer: Fast multi-singer singing voice vocoder with a large-scale corpus
R Huang, F Chen, Y Ren, J Liu, C Cui, Z Zhao
ACM MM 2021, 3945-3954, 2021
1072021
GenerSpeech: Towards Style Transfer for Generalizable Out-Of-Domain Text-to-Speech
R Huang, Y Ren, J Liu, C Cui, Z Zhao
NeurIPS 2022, 2022
992022
M4Singer: a Multi-Style, Multi-Singer and Musical Score Provided Mandarin Singing Corpus
L Zhang, R Li, S Wang, L Deng, J Liu, Y Ren, J He, R Huang, J Zhu, ...
NeurIPS 2022, 2022
812022
Instructtts: Modelling expressive tts in discrete latent space with natural language style prompt
D Yang, S Liu, R Huang, C Weng, H Meng
IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2024
752024
SingGAN: Generative Adversarial Network For High-Fidelity Singing Voice Generation
R Huang, C Cui, F Chen, Y Ren, J Liu, Z Zhao, B Huai, Z Wang
ACM MM 2022, 2022
632022
Make-an-audio 2: Temporal-enhanced text-to-audio generation
J Huang, Y Ren, R Huang, D Yang, Z Ye, C Zhang, J Liu, X Yin, Z Ma, ...
arXiv preprint arXiv:2305.18474, 2023
602023
TranSpeech: Speech-to-Speech Translation With Bilateral Perturbation
R Huang, Z Zhao, J Liu, H Liu, Y Ren, L Zhang, J He
ICLR 2023, 2022
472022
Geneface++: Generalized and stable real-time audio-driven 3d talking face generation
Z Ye, J He, Z Jiang, R Huang, J Huang, J Liu, Y Ren, X Yin, Z Ma, Z Zhao
arXiv preprint arXiv:2305.00787, 2023
372023
EMOVIE: A Mandarin Emotion Speech Dataset with a Simple Emotional Text-to-Speech Model
C Cui, Y Ren, J Liu, F Chen, R Huang, M Lei, Z Zhao
Interspeech, 2021, 2021
372021
Make-a-voice: Revisiting voice large language models as scalable multilingual and multitask learners
R Huang, C Zhang, Y Wang, D Yang, J Tian, Z Ye, L Liu, Z Wang, Z Jiang, ...
Proceedings of the 62nd Annual Meeting of the Association for Computational …, 2024
34*2024
Real3d-portrait: One-shot realistic 3d talking portrait synthesis
Z Ye, T Zhong, Y Ren, J Yang, W Li, J Huang, Z Jiang, J He, R Huang, ...
arXiv preprint arXiv:2401.08503, 2024
312024
Chat-Scene: Bridging 3D Scene and Large Language Models with Object Identifiers
H Huang, Y Chen, Z Wang, R Huang, R Xu, T Wang, L Liu, X Cheng, ...
arXiv preprint arXiv:2312.08168, 2023
282023
Wavtokenizer: an efficient acoustic discrete codec tokenizer for audio language modeling
S Ji, Z Jiang, W Wang, Y Chen, M Fang, J Zuo, Q Yang, X Cheng, Z Wang, ...
arXiv preprint arXiv:2408.16532, 2024
272024
Систем тренутно не може да изврши ову радњу. Пробајте поново касније.
Чланци 1–20