Lyricwhiz: Robust multilingual lyrics transcription by whispering to chatgpt L Zhuo, R Yuan, J Pan, Y Ma, Y Li, G Zhang, S Liu, R Dannenberg, J Fu, ... International Society for Music Information Retrieval Conference (ISMIR), 2023 | 22* | 2023 |
ComposerX: Multi-Agent Symbolic Music Composition with LLMs Q Deng, Q Yang, R Yuan, Y Huang, Y Wang, X Liu, Z Tian, J Pan, ... International Society for Music Information Retrieval Conference (ISMIR), 2024 | 13 | 2024 |
FlashSpeech: Efficient Zero-Shot Speech Synthesis Z Ye, Z Ju, H Liu, X Tan, J Chen, Y Lu, P Sun, J Pan, W Bian, S He, Q Liu, ... ACM MM 2024, 2024 | 12 | 2024 |
Vidmuse: A simple video-to-music generation framework with long-short-term modeling Z Tian, Z Liu, R Yuan, J Pan, Q Liu, X Tan, Q Chen, W Xue, Y Guo arXiv preprint arXiv:2406.04321, 2024 | 6 | 2024 |
Weakly-Supervised Emotion Transition Learning for Diverse 3D Co-speech Gesture Generation X Qi, J Pan, P Li, R Yuan, X Chi, M Li, W Luo, W Xue, S Zhang, Q Liu, ... CVPR 2024, 2024 | 6 | 2024 |
Codec does matter: Exploring the semantic shortcoming of codec for audio language model Z Ye, P Sun, J Lei, H Lin, X Tan, Z Dai, Q Kong, J Chen, J Pan, Q Liu, ... arXiv preprint arXiv:2408.17175, 2024 | 3 | 2024 |
MMTrail: A Multimodal Trailer Video Dataset with Language and Music Descriptions X Chi, Y Wang, A Cheng, P Fang, Z Tian, Y He, Z Liu, X Qi, J Pan, ... arXiv preprint arXiv:2407.20962, 2024 | 1 | 2024 |
CoCoGesture: Toward Coherent Co-speech 3D Gesture Generation in the Wild X Qi, H Zhang, Y Wang, J Pan, C Liu, P Li, X Chi, M Li, Q Zhang, W Xue, ... arXiv preprint arXiv:2405.16874, 2024 | | 2024 |
: Towards Coherent Co-speech 3D Gesture Generation in the Wild X Qi, H Zhang, Y Wang, J Pan, C Liu, P Li, X Chi, M Li, W Xue, S Zhang, ... | | |