Pseudo Numerical Methods for Diffusion Models on Manifolds L Liu, Y Ren, Z Lin, Z Zhao International Conference on Learning Representations (ICLR 2022), 2022 | 578 | 2022 |
Make-an-audio: Text-to-audio generation with prompt-enhanced diffusion models R Huang, J Huang, D Yang, Y Ren, L Liu, M Li, Z Ye, J Liu, X Yin, Z Zhao International Conference on Machine Learning, 13916-13932, 2023 | 314 | 2023 |
Ptqd: Accurate post-training quantization for diffusion models Y He, L Liu, J Liu, W Wu, H Zhou, B Zhuang Advances in Neural Information Processing Systems 36, 2024 | 77 | 2024 |
Make-a-voice: Unified voice synthesis with discrete representation R Huang, C Zhang, Y Wang, D Yang, L Liu, Z Ye, Z Jiang, C Weng, ... arXiv preprint arXiv:2305.19269, 2023 | 30 | 2023 |
Chat-3d v2: Bridging 3d scene and large language models with object identifiers H Huang, Z Wang, R Huang, L Liu, X Cheng, Y Zhao, T Jin, Z Zhao arXiv preprint arXiv:2312.08168, 2023 | 25 | 2023 |
Extending multi-modal contrastive representations Z Wang, Z Zhang, L Liu, Y Zhao, H Huang, T Jin, Z Zhao arXiv preprint arXiv:2310.08884, 2023 | 10 | 2023 |
Era-solver: Error-robust adams solver for fast sampling of diffusion probabilistic models S Li, L Liu, Z Chai, R Li, X Tan arXiv preprint arXiv:2301.12935, 2023 | 10 | 2023 |
Detector guidance for multi-object text-to-image generation L Liu, Z Zhang, Y Ren, R Huang, X Yin, Z Zhao arXiv preprint arXiv:2306.02236, 2023 | 9 | 2023 |
FreeBind: Free Lunch in Unified Multimodal Space via Knowledge Fusion Z Wang, Z Zhang, X Cheng, R Huang, L Liu, Z Ye, H Huang, Y Zhao, T Jin, ... Forty-first International Conference on Machine Learning, 2024 | 8 | 2024 |
Diffusion denoising process for perceptron bias in out-of-distribution detection L Liu, Y Ren, X Cheng, R Huang, C Li, Z Zhao arXiv preprint arXiv:2211.11255, 2022 | 7 | 2022 |
Omnibind: Large-scale omni multimodal representation via binding spaces Z Wang, Z Zhang, H Zhang, L Liu, R Huang, X Cheng, H Zhao, Z Zhao arXiv preprint arXiv:2407.11895, 2024 | 6 | 2024 |
Molecule-Space: Free Lunch in Unified Multimodal Space via Knowledge Fusion Z Wang, Z Zhang, X Cheng, R Huang, L Liu, Z Ye, H Huang, Y Zhao, T Jin, ... arXiv preprint arXiv:2405.04883, 2024 | 6 | 2024 |
Make-a-voice: Revisiting voice large language models as scalable multilingual and multitask learners R Huang, C Zhang, Y Wang, D Yang, J Tian, Z Ye, L Liu, Z Wang, Z Jiang, ... Proceedings of the 62nd Annual Meeting of the Association for Computational …, 2024 | 4 | 2024 |
VoiceTuner: Self-Supervised Pre-training and Efficient Fine-tuning For Voice Generation R Huang, Y Wang, R Hu, X Xu, Z Hong, D Yang, X Cheng, Z Wang, ... Proceedings of the 32nd ACM International Conference on Multimedia, 10630-10639, 2024 | 2 | 2024 |
InstructSpeech: Following Speech Editing Instructions via Large Language Models R Huang, R Hu, Y Wang, Z Wang, X Cheng, Z Jiang, Z Ye, D Yang, L Liu, ... Forty-first International Conference on Machine Learning, 2024 | 2 | 2024 |
Improving Long-Text Alignment for Text-to-Image Diffusion Models L Liu, C Du, T Pang, Z Wang, C Li, D Xu arXiv preprint arXiv:2410.11817, 2024 | | 2024 |