Прати
shengpeng ji
shengpeng ji
Верификована је имејл адреса на zju.edu.cn - Почетна страница
Наслов
Навело
Навело
Година
Symphonize 3d semantic scene completion with contextual instance queries
H Jiang, T Cheng, N Gao, H Zhang, T Lin, W Liu, X Wang
Wrong MegaTTS, 2024
2102024
Mega-TTS 2: Boosting Prompting Mechanisms for Zero-Shot Speech Synthesis
Z Jiang, J Liu, Y Ren, J He, Z Ye, S Ji, Q Yang, C Zhang, P Wei, C Wang, ...
ICLR 2024, 2024
69*2024
Textrolspeech: A text style control speech corpus with codec language text-to-speech models
S Ji, J Zuo, M Fang, Z Jiang, F Chen, X Duan, B Huai, Z Zhao
ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and …, 2024
372024
Wavtokenizer: an efficient acoustic discrete codec tokenizer for audio language modeling
S Ji, Z Jiang, W Wang, Y Chen, M Fang, J Zuo, Q Yang, X Cheng, Z Wang, ...
ICLR 2025, 2024
282024
FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs
technical report
CosyVoice and SenseVoice, 2024
25*2024
Language-codec: Reducing the gaps between discrete codec representation and speech language models
S Ji, M Fang, Z Jiang, S Zheng, Q Chen, R Huang, J Zuo, S Wang, Z Zhao
arXiv preprint arXiv:2402.12208, 2024
162024
Wavchat: A survey of spoken dialogue models
S Ji, Y Chen, M Fang, J Zuo, J Lu, H Wang, Z Jiang, L Zhou, S Liu, ...
arXiv preprint arXiv:2411.13577, 2024
12*2024
Controlspeech: Towards simultaneous zero-shot speaker cloning and zero-shot language style control with decoupled codec
S Ji, J Zuo, W Wang, M Fang, S Zheng, Q Chen, Z Jiang, H Huang, ...
arXiv preprint arXiv:2406.01205, 2024
82024
MobileSpeech: A Fast and High-Fidelity Framework for Mobile Zero-Shot Text-to-Speech
S Ji, Z Jiang, H Wang, J Zuo, Z Zhao
ACL 2024 Main, 2024
82024
Ace: A generative cross-modal retrieval framework with coarse-to-fine semantic modeling
M Fang, S Ji, J Zuo, H Huang, Y Xia, J Zhu, X Cheng, X Yang, W Liu, ...
arXiv preprint arXiv:2406.17507, 2024
52024
Synctalklip: Highly synchronized lip-readable speaker generation with multi-task learning
X Yang, X Cheng, D Fu, M Fang, J Zuo, S Ji, Z Zhao, J Tao
ACM MM 2024, 2024
42024
Generating Neural Networks for Diverse Networking Classification Tasks via Hardware-Aware Neural Architecture Search
G Xie, Q Li, Z Shi, H Fang, S Ji, Y Jiang, Z Yuan, L Ma, M Xu
IEEE Transactions on Computers 73 (2), 481-494, 2023
42023
Unlocking the potential of multimodal unified discrete representation through training-free codebook optimization and hierarchical alignment
H Huang, Y Xia, S Ji, S Wang, H Wang, J Zhu, Z Dong, Z Zhao
arXiv preprint arXiv:2403.05168, 2024
32024
MuVi: Video-to-Music Generation with Semantic Alignment and Rhythmic Synchronization
R Li, S Zheng, X Cheng, Z Zhang, S Ji, Z Zhao
arXiv preprint arXiv:2410.12957, 2024
22024
WavRAG: Audio-Integrated Retrieval Augmented Generation for Spoken Dialogue Models
Y Chen, S Ji, H Wang, Z Wang, S Chen, J He, J Xu, Z Zhao
arXiv preprint arXiv:2502.14727, 2025
2025
Enhancing Expressive Voice Conversion with Discrete Pitch-Conditioned Flow Matching Model
J Zuo, S Ji, M Fang, Z Jiang, X Cheng, Q Yang, W Liu, G Zhang, Z Tu, ...
ICASSP 2025, 2025
2025
OmniChat: Enhancing Spoken Dialogue Systems with Scalable Synthetic Data for Diverse Scenarios
X Cheng, D Fu, X Yang, M Fang, R Hu, J Lu, B Jionghao, Z Wang, S Ji, ...
arXiv preprint arXiv:2501.01384, 2025
2025
VoxpopuliTTS: a large-scale multilingual TTS corpus for zero-shot speech generation
W Liu, J Bai, X Cheng, J Zuo, Z Jiang, S Ji, M Fang, X Yang, Q Yang, ...
COLING 2025, 2025
2025
Speech Watermarking with Discrete Intermediate Representations
S Ji, Z Jiang, J Zuo, M Fang, Y Chen, T Jin, Z Zhao
AAAI 2025, 2024
2024
LLaVA-MR: Large Language-and-Vision Assistant for Video Moment Retrieval
W Lu, J Li, A Yu, MC Chang, S Ji, M Xia
arXiv preprint arXiv:2411.14505, 2024
2024
Систем тренутно не може да изврши ову радњу. Пробајте поново касније.
Чланци 1–20