Följ
Ziyang Ma
Titel
Citeras av
Citeras av
År
emotion2vec: Self-supervised pre-training for speech emotion representation
Z Ma, Z Zheng, J Ye, J Li, Z Gao, S Zhang, X Chen
Proc. ACL 2024, 2024
802024
Lauragpt: Listen, attend, understand, and regenerate audio with gpt
Z Du, J Wang, Q Chen, Y Chu, Z Gao, Z Li, K Hu, X Zhou, J Xu, Z Ma, ...
arXiv preprint arXiv:2310.04673, 2023
68*2023
Cosyvoice: A scalable multilingual zero-shot text-to-speech synthesizer based on supervised semantic tokens
Z Du, Q Chen, S Zhang, K Hu, H Lu, Y Yang, H Hu, S Zheng, Y Gu, Z Ma, ...
arXiv preprint arXiv:2407.05407, 2024
622024
An Embarrassingly Simple Approach for LLM with Strong ASR Capacity
Z Ma, G Yang, Y Yang, Z Gao, J Wang, Z Du, F Yu, Q Chen, S Zheng, ...
arXiv preprint arXiv:2402.08846, 2024
412024
Chatmusician: Understanding and generating music intrinsically with llm
R Yuan, H Lin, Y Wang, Z Tian, S Wu, T Shen, G Zhang, Y Wu, C Liu, ...
Proc. ACL 2024, 2024
352024
Ella-v: Stable neural codec language modeling with alignment-guided sequence reordering
Y Song, Z Chen, X Wang, Z Ma, X Chen
Proc. AAAI 2025, 2024
312024
Voiceflow: Efficient text-to-speech with rectified flow matching
Y Guo, C Du, Z Ma, X Chen, K Yu
ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and …, 2024
28*2024
Towards universal speech discrete tokens: A case study for asr and tts
Y Yang, F Shen, C Du, Z Ma, K Yu, D Povey, X Chen
ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and …, 2024
28*2024
Map-neo: Highly capable and transparent bilingual large language model series
G Zhang, S Qu, J Liu, C Zhang, C Lin, CL Yu, D Pan, E Cheng, J Liu, ...
arXiv preprint arXiv:2405.19327, 2024
272024
Funaudiollm: Voice understanding and generation foundation models for natural interaction between humans and llms
K An, Q Chen, C Deng, Z Du, C Gao, Z Gao, Y Gu, T He, H Hu, K Hu, S Ji, ...
arXiv preprint arXiv:2407.04051, 2024
252024
MT4SSL: Boosting self-supervised speech representation learning by integrating multiple targets
Z Ma, Z Zheng, C Tang, Y Wang, X Chen
Proc. Interspeech 2023 Best Student Paper Shortlist, 2023
222023
Mer 2024: Semi-supervised learning, noise robustness, and open-vocabulary multimodal emotion recognition
Z Lian, H Sun, L Sun, Z Wen, S Zhang, S Chen, H Gu, J Zhao, Z Ma, ...
Proceedings of the 2nd International Workshop on Multimodal and Responsible …, 2024
212024
F5-tts: A fairytaler that fakes fluent and faithful speech with flow matching
Y Chen, Z Niu, Z Ma, K Deng, C Wang, J Zhao, K Yu, X Chen
arXiv preprint arXiv:2410.06885, 2024
202024
EmoBox: Multilingual Multi-corpus Speech Emotion Recognition Toolkit and Benchmark
Z Ma, M Chen, H Zhang, Z Zheng, W Chen, X Li, J Ye, X Chen, T Hain
Proc. Interspeech 2024, 2024
172024
Leveraging speech PTM, text LLM, and emotional TTS for speech emotion recognition
Z Ma, W Wu, Z Zheng, Y Guo, Q Chen, S Zhang, X Chen
ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and …, 2024
172024
EAT: Self-supervised pre-training with efficient audio transformer
W Chen, Y Liang, Z Ma, Z Zheng, X Chen
Proc. IJCAI 2024, 2024
162024
Language Model Can Listen While Speaking
Z Ma, Y Song, C Du, J Cong, Z Chen, Y Wang, Y Wang, X Chen
Proc. AAAI 2025, 2024
14*2024
LoRA-Whisper: Parameter-Efficient and Extensible Multilingual ASR
Z Song, J Zhuo, Y Yang, Z Ma, S Zhang, X Chen
Proc. Interspeech 2024, 2024
132024
Chinese tiny llm: Pretraining a chinese-centric large language model
X Du, Z Yu, S Gao, D Pan, Y Cheng, Z Ma, R Yuan, X Qu, J Liu, T Zheng, ...
Proc. 1st COLM, 2024
132024
Foundation models for music: A survey
Y Ma, A Øland, A Ragni, BMS Del Sette, C Saitis, C Donahue, C Lin, ...
arXiv preprint arXiv:2408.14340, 2024
122024
Systemet kan inte utföra åtgärden just nu. Försök igen senare.
Artiklar 1–20