Textrolspeech: A text style control speech corpus with codec language text-to-speech models S Ji, J Zuo, M Fang, Z Jiang, F Chen, X Duan, B Huai, Z Zhao ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and …, 2024 | 32 | 2024 |
Wavtokenizer: an efficient acoustic discrete codec tokenizer for audio language modeling S Ji, Z Jiang, W Wang, Y Chen, M Fang, J Zuo, Q Yang, X Cheng, Z Wang, ... arXiv preprint arXiv:2408.16532, 2024 | 21 | 2024 |
Language-codec: Reducing the gaps between discrete codec representation and speech language models S Ji, M Fang, Z Jiang, S Zheng, Q Chen, R Huang, J Zuo, S Wang, Z Zhao arXiv preprint arXiv:2402.12208, 2024 | 15 | 2024 |
Controlspeech: Towards simultaneous zero-shot speaker cloning and zero-shot language style control with decoupled codec S Ji, J Zuo, W Wang, M Fang, S Zheng, Q Chen, Z Jiang, H Huang, ... arXiv preprint arXiv:2406.01205, 2024 | 7 | 2024 |
Wavchat: A survey of spoken dialogue models S Ji, Y Chen, M Fang, J Zuo, J Lu, H Wang, Z Jiang, L Zhou, S Liu, ... arXiv preprint arXiv:2411.13577, 2024 | 6 | 2024 |
Synctalklip: Highly synchronized lip-readable speaker generation with multi-task learning X Yang, X Cheng, D Fu, M Fang, J Zuo, S Ji, Z Zhao, J Tao Proceedings of the 32nd ACM International Conference on Multimedia, 8149-8158, 2024 | 4 | 2024 |
Ace: A generative cross-modal retrieval framework with coarse-to-fine semantic modeling M Fang, S Ji, J Zuo, H Huang, Y Xia, J Zhu, X Cheng, X Yang, W Liu, ... arXiv preprint arXiv:2406.17507, 2024 | 4 | 2024 |
Zero-resource Hallucination Detection for Text Generation via Graph-based Contextual Knowledge Triples Modeling X Fang, Z Huang, Z Tian, M Fang, Z Pan, Q Fang, Z Wen, H Pan, D Li arXiv preprint arXiv:2409.11283, 2024 | 1 | 2024 |
OmniChat: Enhancing Spoken Dialogue Systems with Scalable Synthetic Data for Diverse Scenarios X Cheng, D Fu, X Yang, M Fang, R Hu, J Lu, B Jionghao, Z Wang, S Ji, ... arXiv preprint arXiv:2501.01384, 2025 | | 2025 |
VoxpopuliTTS: a large-scale multilingual TTS corpus for zero-shot speech generation W Liu, J Bai, X Cheng, J Zuo, Z Jiang, S Ji, M Fang, X Yang, Q Yang, ... Proceedings of the 31st International Conference on Computational …, 2025 | | 2025 |
Speech Watermarking with Discrete Intermediate Representations S Ji, Z Jiang, J Zuo, M Fang, Y Chen, T Jin, Z Zhao arXiv preprint arXiv:2412.13917, 2024 | | 2024 |
MoMu-Diffusion: On Learning Long-Term Motion-Music Synchronization and Correspondence F You, M Fang, L Tang, R Huang, Y Wang, Z Zhao arXiv preprint arXiv:2411.01805, 2024 | | 2024 |
AudioVSR: Enhancing Video Speech Recognition with Audio Data X Yang, X Cheng, J Duan, H Qiu, M Hong, M Fang, S Ji, J Zuo, Z Hong, ... Proceedings of the 2024 Conference on Empirical Methods in Natural Language …, 2024 | | 2024 |
OmniSep: Unified Omni-Modality Sound Separation with Query-Mixup X Cheng, S Zheng, Z Wang, M Fang, Z Zhang, R Huang, Z Ma, S Ji, J Zuo, ... arXiv preprint arXiv:2410.21269, 2024 | | 2024 |
AVSET-10M: An Open Large-Scale Audio-Visual Dataset with High Correspondence X Cheng, Z Zhang, Z Wang, M Fang, R Huang, S Zheng, R Hu, ... | | 2024 |
Advancing Multimodal Unified Discrete Representations H Huang, Y Xia, S Ji, S Wang, H Wang, M Fang, J Zhu, Z Dong, Z Wang, ... | | |
ControlSpeech: Towards Simultaneous Zero-shot Speaker Cloning and Zero-shot Language Style Control S Ji, J Zuo, W Wang, M Fang, Q Chen, Z Jiang, H Huang, Z Wang, ... | | |
MindLoc: A Secure Brain-Based System for Object Localization X Yang, X Cheng, JY Lu, H Qiu, M Fang, W Yan, Z Jiang, J Zuo, S Ji, ... | | |