フォロー
Minghui Fang
Minghui Fang
確認したメール アドレス: zju.edu.cn
タイトル
引用先
引用先
Textrolspeech: A text style control speech corpus with codec language text-to-speech models
S Ji, J Zuo, M Fang, Z Jiang, F Chen, X Duan, B Huai, Z Zhao
ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and …, 2024
322024
Wavtokenizer: an efficient acoustic discrete codec tokenizer for audio language modeling
S Ji, Z Jiang, W Wang, Y Chen, M Fang, J Zuo, Q Yang, X Cheng, Z Wang, ...
arXiv preprint arXiv:2408.16532, 2024
212024
Language-codec: Reducing the gaps between discrete codec representation and speech language models
S Ji, M Fang, Z Jiang, S Zheng, Q Chen, R Huang, J Zuo, S Wang, Z Zhao
arXiv preprint arXiv:2402.12208, 2024
152024
Controlspeech: Towards simultaneous zero-shot speaker cloning and zero-shot language style control with decoupled codec
S Ji, J Zuo, W Wang, M Fang, S Zheng, Q Chen, Z Jiang, H Huang, ...
arXiv preprint arXiv:2406.01205, 2024
72024
Wavchat: A survey of spoken dialogue models
S Ji, Y Chen, M Fang, J Zuo, J Lu, H Wang, Z Jiang, L Zhou, S Liu, ...
arXiv preprint arXiv:2411.13577, 2024
62024
Synctalklip: Highly synchronized lip-readable speaker generation with multi-task learning
X Yang, X Cheng, D Fu, M Fang, J Zuo, S Ji, Z Zhao, J Tao
Proceedings of the 32nd ACM International Conference on Multimedia, 8149-8158, 2024
42024
Ace: A generative cross-modal retrieval framework with coarse-to-fine semantic modeling
M Fang, S Ji, J Zuo, H Huang, Y Xia, J Zhu, X Cheng, X Yang, W Liu, ...
arXiv preprint arXiv:2406.17507, 2024
42024
Zero-resource Hallucination Detection for Text Generation via Graph-based Contextual Knowledge Triples Modeling
X Fang, Z Huang, Z Tian, M Fang, Z Pan, Q Fang, Z Wen, H Pan, D Li
arXiv preprint arXiv:2409.11283, 2024
12024
OmniChat: Enhancing Spoken Dialogue Systems with Scalable Synthetic Data for Diverse Scenarios
X Cheng, D Fu, X Yang, M Fang, R Hu, J Lu, B Jionghao, Z Wang, S Ji, ...
arXiv preprint arXiv:2501.01384, 2025
2025
VoxpopuliTTS: a large-scale multilingual TTS corpus for zero-shot speech generation
W Liu, J Bai, X Cheng, J Zuo, Z Jiang, S Ji, M Fang, X Yang, Q Yang, ...
Proceedings of the 31st International Conference on Computational …, 2025
2025
Speech Watermarking with Discrete Intermediate Representations
S Ji, Z Jiang, J Zuo, M Fang, Y Chen, T Jin, Z Zhao
arXiv preprint arXiv:2412.13917, 2024
2024
MoMu-Diffusion: On Learning Long-Term Motion-Music Synchronization and Correspondence
F You, M Fang, L Tang, R Huang, Y Wang, Z Zhao
arXiv preprint arXiv:2411.01805, 2024
2024
AudioVSR: Enhancing Video Speech Recognition with Audio Data
X Yang, X Cheng, J Duan, H Qiu, M Hong, M Fang, S Ji, J Zuo, Z Hong, ...
Proceedings of the 2024 Conference on Empirical Methods in Natural Language …, 2024
2024
OmniSep: Unified Omni-Modality Sound Separation with Query-Mixup
X Cheng, S Zheng, Z Wang, M Fang, Z Zhang, R Huang, Z Ma, S Ji, J Zuo, ...
arXiv preprint arXiv:2410.21269, 2024
2024
AVSET-10M: An Open Large-Scale Audio-Visual Dataset with High Correspondence
X Cheng, Z Zhang, Z Wang, M Fang, R Huang, S Zheng, R Hu, ...
2024
Advancing Multimodal Unified Discrete Representations
H Huang, Y Xia, S Ji, S Wang, H Wang, M Fang, J Zhu, Z Dong, Z Wang, ...
ControlSpeech: Towards Simultaneous Zero-shot Speaker Cloning and Zero-shot Language Style Control
S Ji, J Zuo, W Wang, M Fang, Q Chen, Z Jiang, H Huang, Z Wang, ...
MindLoc: A Secure Brain-Based System for Object Localization
X Yang, X Cheng, JY Lu, H Qiu, M Fang, W Yan, Z Jiang, J Zuo, S Ji, ...
現在システムで処理を実行できません。しばらくしてからもう一度お試しください。
論文 1–18