Obserwuj
Jialong Zuo
Jialong Zuo
Zweryfikowany adres z zju.edu.cn
Tytuł
Cytowane przez
Cytowane przez
Rok
Textrolspeech: A text style control speech corpus with codec language text-to-speech models
S Ji, J Zuo, M Fang, Z Jiang, F Chen, X Duan, B Huai, Z Zhao
ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and …, 2024
352024
Wavtokenizer: an efficient acoustic discrete codec tokenizer for audio language modeling
S Ji, Z Jiang, W Wang, Y Chen, M Fang, J Zuo, Q Yang, X Cheng, Z Wang, ...
arXiv preprint arXiv:2408.16532, 2024
252024
Language-codec: Reducing the gaps between discrete codec representation and speech language models
S Ji, M Fang, Z Jiang, S Zheng, Q Chen, R Huang, J Zuo, S Wang, Z Zhao
arXiv preprint arXiv:2402.12208, 2024
162024
FluentSpeech: Stutter-Oriented Automatic Speech Editing with Context-Aware Diffusion Models
Z Jiang, Q Yang, J Zuo, Z Ye, R Huang, Y Ren, Z Zhao
arXiv preprint arXiv:2305.13612, 2023
132023
MobileSpeech: A Fast and High-Fidelity Framework for Mobile Zero-Shot Text-to-Speech
S Ji, Z Jiang, H Wang, J Zuo, Z Zhao
arXiv preprint arXiv:2402.09378, 2024
82024
Wavchat: A survey of spoken dialogue models
S Ji, Y Chen, M Fang, J Zuo, J Lu, H Wang, Z Jiang, L Zhou, S Liu, ...
arXiv preprint arXiv:2411.13577, 2024
72024
Controlspeech: Towards simultaneous zero-shot speaker cloning and zero-shot language style control with decoupled codec
S Ji, J Zuo, W Wang, M Fang, S Zheng, Q Chen, Z Jiang, H Huang, ...
arXiv preprint arXiv:2406.01205, 2024
72024
Ace: A generative cross-modal retrieval framework with coarse-to-fine semantic modeling
M Fang, S Ji, J Zuo, H Huang, Y Xia, J Zhu, X Cheng, X Yang, W Liu, ...
arXiv preprint arXiv:2406.17507, 2024
52024
Synctalklip: Highly synchronized lip-readable speaker generation with multi-task learning
X Yang, X Cheng, D Fu, M Fang, J Zuo, S Ji, Z Zhao, J Tao
Proceedings of the 32nd ACM International Conference on Multimedia, 8149-8158, 2024
42024
Mscenespeech: A multi-scene speech dataset for expressive speech synthesis
Q Yang, J Zuo, Z Su, Z Jiang, M Li, Z Zhao, F Chen, Z Wang, B Huai
arXiv preprint arXiv:2407.14006, 2024
12024
Enhancing Expressive Voice Conversion with Discrete Pitch-Conditioned Flow Matching Model
J Zuo, S Ji, M Fang, Z Jiang, X Cheng, Q Yang, W Liu, G Zhang, Z Tu, ...
arXiv preprint arXiv:2502.05471, 2025
2025
VoxpopuliTTS: a large-scale multilingual TTS corpus for zero-shot speech generation
W Liu, J Bai, X Cheng, J Zuo, Z Jiang, S Ji, M Fang, X Yang, Q Yang, ...
Proceedings of the 31st International Conference on Computational …, 2025
2025
Speech Watermarking with Discrete Intermediate Representations
S Ji, Z Jiang, J Zuo, M Fang, Y Chen, T Jin, Z Zhao
arXiv preprint arXiv:2412.13917, 2024
2024
AudioVSR: Enhancing Video Speech Recognition with Audio Data
X Yang, X Cheng, J Duan, H Qiu, M Hong, M Fang, S Ji, J Zuo, Z Hong, ...
Proceedings of the 2024 Conference on Empirical Methods in Natural Language …, 2024
2024
OmniSep: Unified Omni-Modality Sound Separation with Query-Mixup
X Cheng, S Zheng, Z Wang, M Fang, Z Zhang, R Huang, Z Ma, S Ji, J Zuo, ...
arXiv preprint arXiv:2410.21269, 2024
2024
ControlSpeech: Towards Simultaneous Zero-shot Speaker Cloning and Zero-shot Language Style Control
S Ji, J Zuo, W Wang, M Fang, Q Chen, Z Jiang, H Huang, Z Wang, ...
Nie można teraz wykonać tej operacji. Spróbuj ponownie później.
Prace 1–16