Graphtext: Graph reasoning in text space J Zhao, L Zhuo, Y Shen, M Qu, K Liu, M Bronstein, Z Zhu, J Tang arXiv preprint arXiv:2310.01089, 2023 | 57 | 2023 |
Lumina-T2X: Transforming Text into Any Modality, Resolution, and Duration via Flow-based Large Diffusion Transformers P Gao, L Zhuo, Z Lin, C Liu, J Chen, R Du, E Xie, X Luo, L Qiu, Y Zhang, ... arXiv preprint arXiv:2405.05945, 2024 | 54* | 2024 |
Video background music generation: Dataset, method and evaluation L Zhuo, Z Wang, B Wang, Y Liao, C Bao, S Peng, S Han, A Zhang, F Fang, ... Proceedings of the IEEE/CVF International Conference on Computer Vision …, 2023 | 28 | 2023 |
Lumina-mgpt: Illuminate flexible photorealistic text-to-image generation with multimodal generative pretraining D Liu, S Zhao, L Zhuo, W Lin, Y Qiao, H Li, P Gao arXiv preprint arXiv:2408.02657, 2024 | 26 | 2024 |
Marble: Music audio representation benchmark for universal evaluation R Yuan, Y Ma, Y Li, G Zhang, X Chen, H Yin, Y Liu, J Huang, Z Tian, ... Advances in Neural Information Processing Systems 36, 39626-39647, 2023 | 23 | 2023 |
Lyricwhiz: Robust multilingual lyrics transcription by whispering to chatgpt L Zhuo, R Yuan, J Pan, Y Ma, Y Li, G Zhang, S Liu, R Dannenberg, J Fu, ... International Society for Music Information Retrieval Conference (ISMIR), 2023 | 22* | 2023 |
Diffdance: Cascaded human motion diffusion model for dance generation Q Qi, L Zhuo, A Zhang, Y Liao, F Fang, S Liu, S Yan Proceedings of the 31st ACM International Conference on Multimedia, 1374-1382, 2023 | 19 | 2023 |
Lumina-next: Making lumina-t2x stronger and faster with next-dit L Zhuo, R Du, H Xiao, Y Li, D Liu, R Huang, W Liu, L Zhao, FY Wang, ... arXiv preprint arXiv:2406.18583, 2024 | 17* | 2024 |
Protllm: An interleaved protein-language llm with protein-as-word pre-training L Zhuo, Z Chi, M Xu, H Huang, H Zheng, C He, XL Mao, W Zhang arXiv preprint arXiv:2403.07920, 2024 | 14 | 2024 |
Llms as visual explainers: Advancing image classification with evolving visual descriptions S Han, L Zhuo, Y Liao, S Liu arXiv preprint arXiv:2311.11904, 2023 | 10 | 2023 |
Llava-mod: Making llava tiny via moe knowledge distillation F Shu, Y Liao, L Zhuo, C Xu, L Zhang, G Zhang, H Shi, L Chen, T Zhong, ... arXiv preprint arXiv:2408.15881, 2024 | 9 | 2024 |
Customize your visual autoregressive recipe with set autoregressive modeling W Liu, L Zhuo, Y Xin, S Xia, P Gao, X Yue arXiv preprint arXiv:2410.10511, 2024 | 5 | 2024 |
PixWizard: Versatile image-to-image visual assistant with open-language instructions W Lin, X Wei, R Zhang, L Zhuo, S Zhao, S Huang, J Xie, Y Qiao, P Gao, ... arXiv preprint arXiv:2409.15278, 2024 | 3 | 2024 |
I-max: Maximize the resolution potential of pre-trained rectified flow transformers with projected flow R Du, D Liu, L Zhuo, Q Qi, H Li, Z Ma, P Gao arXiv preprint arXiv:2410.07536, 2024 | 1 | 2024 |
IMAGINE-E: Image Generation Intelligence Evaluation of State-of-the-art Text-to-Image Models J Lei, R Zhang, X Hu, W Lin, Z Li, W Sun, R Du, L Zhuo, Z Li, X Li, S Zhao, ... arXiv preprint arXiv:2501.13920, 2025 | | 2025 |
Multimodal Music Generation with Explicit Bridges and Retrieval Augmentation B Wang, L Zhuo, Z Wang, C Bao, W Chengjing, X Nie, J Dai, J Han, ... arXiv preprint arXiv:2412.09428, 2024 | | 2024 |
VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection S Han, W Huang, H Shi, L Zhuo, X Su, S Zhang, X Zhou, X Qi, Y Liao, ... arXiv preprint arXiv:2411.14794, 2024 | | 2024 |