Symphonize 3d semantic scene completion with contextual instance queries H Jiang, T Cheng, N Gao, H Zhang, T Lin, W Liu, X Wang Wrong MegaTTS, 2024 | 210 | 2024 |
Mega-TTS 2: Boosting Prompting Mechanisms for Zero-Shot Speech Synthesis Z Jiang, J Liu, Y Ren, J He, Z Ye, S Ji, Q Yang, C Zhang, P Wei, C Wang, ... ICLR 2024, 2024 | 69* | 2024 |
Textrolspeech: A text style control speech corpus with codec language text-to-speech models S Ji, J Zuo, M Fang, Z Jiang, F Chen, X Duan, B Huai, Z Zhao ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and …, 2024 | 37 | 2024 |
Wavtokenizer: an efficient acoustic discrete codec tokenizer for audio language modeling S Ji, Z Jiang, W Wang, Y Chen, M Fang, J Zuo, Q Yang, X Cheng, Z Wang, ... ICLR 2025, 2024 | 28 | 2024 |
FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs technical report CosyVoice and SenseVoice, 2024 | 25* | 2024 |
Language-codec: Reducing the gaps between discrete codec representation and speech language models S Ji, M Fang, Z Jiang, S Zheng, Q Chen, R Huang, J Zuo, S Wang, Z Zhao arXiv preprint arXiv:2402.12208, 2024 | 16 | 2024 |
Wavchat: A survey of spoken dialogue models S Ji, Y Chen, M Fang, J Zuo, J Lu, H Wang, Z Jiang, L Zhou, S Liu, ... arXiv preprint arXiv:2411.13577, 2024 | 12* | 2024 |
Controlspeech: Towards simultaneous zero-shot speaker cloning and zero-shot language style control with decoupled codec S Ji, J Zuo, W Wang, M Fang, S Zheng, Q Chen, Z Jiang, H Huang, ... arXiv preprint arXiv:2406.01205, 2024 | 8 | 2024 |
MobileSpeech: A Fast and High-Fidelity Framework for Mobile Zero-Shot Text-to-Speech S Ji, Z Jiang, H Wang, J Zuo, Z Zhao ACL 2024 Main, 2024 | 8 | 2024 |
Ace: A generative cross-modal retrieval framework with coarse-to-fine semantic modeling M Fang, S Ji, J Zuo, H Huang, Y Xia, J Zhu, X Cheng, X Yang, W Liu, ... arXiv preprint arXiv:2406.17507, 2024 | 5 | 2024 |
Synctalklip: Highly synchronized lip-readable speaker generation with multi-task learning X Yang, X Cheng, D Fu, M Fang, J Zuo, S Ji, Z Zhao, J Tao ACM MM 2024, 2024 | 4 | 2024 |
Generating Neural Networks for Diverse Networking Classification Tasks via Hardware-Aware Neural Architecture Search G Xie, Q Li, Z Shi, H Fang, S Ji, Y Jiang, Z Yuan, L Ma, M Xu IEEE Transactions on Computers 73 (2), 481-494, 2023 | 4 | 2023 |
Unlocking the potential of multimodal unified discrete representation through training-free codebook optimization and hierarchical alignment H Huang, Y Xia, S Ji, S Wang, H Wang, J Zhu, Z Dong, Z Zhao arXiv preprint arXiv:2403.05168, 2024 | 3 | 2024 |
MuVi: Video-to-Music Generation with Semantic Alignment and Rhythmic Synchronization R Li, S Zheng, X Cheng, Z Zhang, S Ji, Z Zhao arXiv preprint arXiv:2410.12957, 2024 | 2 | 2024 |
WavRAG: Audio-Integrated Retrieval Augmented Generation for Spoken Dialogue Models Y Chen, S Ji, H Wang, Z Wang, S Chen, J He, J Xu, Z Zhao arXiv preprint arXiv:2502.14727, 2025 | | 2025 |
Enhancing Expressive Voice Conversion with Discrete Pitch-Conditioned Flow Matching Model J Zuo, S Ji, M Fang, Z Jiang, X Cheng, Q Yang, W Liu, G Zhang, Z Tu, ... ICASSP 2025, 2025 | | 2025 |
OmniChat: Enhancing Spoken Dialogue Systems with Scalable Synthetic Data for Diverse Scenarios X Cheng, D Fu, X Yang, M Fang, R Hu, J Lu, B Jionghao, Z Wang, S Ji, ... arXiv preprint arXiv:2501.01384, 2025 | | 2025 |
VoxpopuliTTS: a large-scale multilingual TTS corpus for zero-shot speech generation W Liu, J Bai, X Cheng, J Zuo, Z Jiang, S Ji, M Fang, X Yang, Q Yang, ... COLING 2025, 2025 | | 2025 |
Speech Watermarking with Discrete Intermediate Representations S Ji, Z Jiang, J Zuo, M Fang, Y Chen, T Jin, Z Zhao AAAI 2025, 2024 | | 2024 |
LLaVA-MR: Large Language-and-Vision Assistant for Video Moment Retrieval W Lu, J Li, A Yu, MC Chang, S Ji, M Xia arXiv preprint arXiv:2411.14505, 2024 | | 2024 |