Dongchao Yang

引用先

	すべて	2020 年以来
引用	1722	1721
h 指標	18	18
i10 指標	20	20

1200

600

300

900

2021202220232024202516 51 317 1184 142

オープンアクセス

すべて表示

10 件の論文

0 件の論文

利用可能

利用不可

助成機関の要件に基づく

共著者

Yuexian ZouPeking University Shenzhen Graduate School確認したメールアドレス: pku.edu.cn
Rongjie HuangFAIR, Zhejiang University確認したメールアドレス: zju.edu.cn
Helin WangJohns Hopkins University確認したメールアドレス: jh.edu
Xu TanPrincipal Researcher and Research Manager, Microsoft確認したメールアドレス: microsoft.com
Yi Ren (任意)Research Scientist, Tiktok確認したメールアドレス: bytedance.com
Jinchuan TianLanguage Technologies Institute, Carnegie Mellon University確認したメールアドレス: andrew.cmu.edu
Jiatong Shi (史嘉彤)Carnegie Mellon University確認したメールアドレス: andrew.cmu.edu
Dong Yu (俞栋)Distinguished Scientist @ Tencent AI Lab, ACM/IEEE/ISCA Fellow確認したメールアドレス: global.tencent.com
Haohan GuoChinese University of Hong Kong確認したメールアドレス: se.cuhk.edu.hk
Yifei XinPeking University確認したメールアドレス: stu.pku.edu.cn
Wenwu WangProfessor, University of Surrey, UK確認したメールアドレス: surrey.ac.uk
Nuo ChenHong Kong University of Science and Technology確認したメールアドレス: connect.ust.hk
Haibin WuMicrosoft確認したメールアドレス: microsoft.com
Songxiang Liu
Chao Weng

フォロー

Dongchao Yang

Chinese University of Hong Kong

確認したメールアドレス: se.cuhk.edu.hk - ホームページ

TTS TTA Audio Codec Multi-modal Audio Fundation Models


タイトル引用回数順公開年順タイトル順	引用先引用先	年
Make-an-audio: Text-to-audio generation with prompt-enhanced diffusion models R Huang, J Huang, D Yang*, Y Ren, L Liu, M Li, Z Ye, J Liu, X Yin, ... ICML 2023, 2023	316	2023
Diffsound: Discrete diffusion model for text-to-sound generation D Yang, J Yu, H Wang, W Wang, C Weng, Y Zou, D Yu IEEE Transactions on Audio, Speech and Language Processing (TASLP)., 2023	310	2023
AudioGPT: Understanding and generating speech, music, sound, and talking head R Huang, M Li, D Yang, J Shi, X Chang, Z Ye, Y Wu, Z Hong, J Huang, ... AAAI, demo 2024, 2023	184	2023
NaturalSpeech 3: Zero-shot speech synthesis with factorized codec and diffusion models Z Ju, Y Wang, K Shen, X Tan, D Xin, D Yang, Y Liu, Y Leng, K Song, ... ICML 2024, 2024	130	2024
UniAudio: An Audio Foundation Model Toward Universal Audio Generation D Yang, J Tian, X Tan, R Huang, S Liu, X Chang, J Shi, S Zhao, J Bian, ... ICML 2024, 2023	110	2023
Hifi-codec: Group-residual vector quantization for high fidelity audio codec D Yang, S Liu, R Huang, J Tian, C Weng, Y Zou arXiv preprint arXiv:2305.02765, 2023	107	2023
InstructTTS: Modelling expressive TTS in discrete latent space with natural language style prompt D Yang, S Liu, R Huang, C Weng, H Meng IEEE Transactions on Audio, Speech and Language Processing (TASLP), 2024	86	2024
Make-an-audio 2: Temporal-enhanced text-to-audio generation J Huang, Y Ren, R Huang, D Yang, Z Ye, C Zhang, J Liu, X Yin, Z Ma, ... arXiv preprint arXiv:2305.18474, 2023	57	2023
Towards data distillation for end-to-end spoken conversational question answering C You, N Chen, F Liu, D Yang, Y Zou arXiv preprint arXiv:2010.08923, 2021	41	2021
Prompttts 2: Describing and generating voices with text prompt Y Leng, Z Guo, K Shen, X Tan, Z Ju, Y Liu, Y Liu, D Yang, L Zhang, ... ICLR 2024, 2023	39	2023
A Mutual learning framework for Few-shot Sound Event Detection D Yang, H Wang, Y Zou, Z Ye, W Wang ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and …, 2022	39*	2022
Improving Text-Audio Retrieval by Text-aware Attention Pooling and Prior Matrix Revised Loss Y Xin, D Yang, Y Zou ICASSP2023, 2023	35	2023
Improving the Performance of Automated Audio Captioning via Integrating the Acoustic and Semantic Information Z Ye, H Wang, D Yang, Y Zou Workshop on Detection and Classification of Acoustic Scenes and Events (DCASE), 2021	35	2021
Make-a-voice: Unified voice synthesis with discrete representation R Huang, C Zhang, Y Wang, D Yang, L Liu, Z Ye, Z Jiang, C Weng, ... ACL 2024, 2023	30	2023
Rall-e: Robust codec language modeling with chain-of-thought prompting for text-to-speech synthesis D Xin, X Tan, K Shen, Z Ju, D Yang, Y Wang, S Takamichi, H Saruwatari, ... arXiv preprint arXiv:2404.03204, 2024	24	2024
Target Confusion in End-to-end Speaker Extraction: Analysis and Approaches Z Zhao, D Yang, R Gu, H Zhang, Y Zou Interspeech2022, 2022	22	2022
Norespeech: Knowledge distillation based conditional diffusion model for noise-robust expressive tts D Yang, S Liu, J Yu, H Wang, C Weng, Y Zou Interspeech2023, 2022	19	2022
Audio Pyramid Transformer with Domain Adaption for Weakly Supervised Sound Event Detection and Audio Classification Y Xin, D Yang, Y Zou Proc. Interspeech 2022, 1546-1550, 2022	19	2022
SimpleSpeech: Towards Simple and Efficient Text-to-Speech with Scalar Latent Transformer Diffusion Models D Yang, D Wang, H Guo, X Chen, X Wu, H Meng Interspeech2024, 2024	14	2024
UniAudio 1.5: Large Language Model-driven Audio Codec is A Few-shot Audio Task Learner D Yang, H Guo, Y Wang, R Huang, X Li, X Tan, X Wu, H Meng NIPS 2024, 2024	11	2024

現在システムで処理を実行できません。しばらくしてからもう一度お試しください。

論文 1–20

年間引用数

重複した引用

結合された引用

共著者を追加共著者

フォロー

引用先

共著者