BLVD: Building a large-scale 5D semantics benchmark for autonomous driving J Xue, J Fang, T Li, B Zhang, P Zhang, Z Ye, J Dou 2019 International Conference on Robotics and Automation (ICRA), 6685-6691, 2019 | 72 | 2019 |
Comospeech: One-step speech and singing voice synthesis via consistency model Z Ye, W Xue, X Tan, J Chen, Q Liu, Y Guo Proceedings of the 31st ACM International Conference on Multimedia, 1831-1839, 2023 | 37 | 2023 |
FlashSpeech: Efficient Zero-Shot Speech Synthesis Z Ye, Z Ju, H Liu, X Tan, J Chen, Y Lu, P Sun, J Pan, W Bian, S He, W Xue, ... ACM MM 2024, 2024 | 12 | 2024 |
Comosvc: Consistency model-based singing voice conversion Y Lu, Z Ye, W Xue, X Tan, Q Liu, Y Guo 2024 IEEE 14th International Symposium on Chinese Spoken Language Processing …, 2024 | 10 | 2024 |
Mfc-bench: Benchmarking multimodal fact-checking with large vision-language models S Wang, H Lin, Z Luo, Z Ye, G Chen, J Ma arXiv preprint arXiv:2406.11288, 2024 | 5 | 2024 |
NAS-FM: neural architecture search for tunable and interpretable sound synthesis based on frequency modulation Z Ye, W Xue, X Tan, Q Liu, Y Guo IJCAI 2023, 2023 | 5 | 2023 |
Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model Z Ye, P Sun, J Lei, H Lin, X Tan, Z Dai, Q Kong, J Chen, J Pan, Q Liu, ... AAAI 2025, 2024 | 4 | 2024 |
FastSAG: Towards Fast Non-Autoregressive Singing Accompaniment Generation J Chen, W Xue, X Tan, Z Ye, Q Liu, Y Guo IJCAI 2024, 2024 | 2 | 2024 |
Llasa: Scaling Train-Time and Inference-Time Compute for Llama-based Speech Synthesis Z Ye, X Zhu, CM Chan, X Wang, X Tan, J Lei, Y Peng, H Liu, Y Jin, Z DAI, ... arXiv preprint arXiv:2502.04128, 2025 | 1 | 2025 |
Both Ears Wide Open: Towards Language-Driven Spatial Audio Generation P Sun, S Cheng, X Li, Z Ye, H Liu, H Zhang, W Xue, Y Guo arXiv preprint arXiv:2410.10676, 2024 | 1 | 2024 |
ScratchEval: Are GPT-4o Smarter than My Child? Evaluating Large Multimodal Models with Visual Programming Challenges R Fu, Z Luo, H Lin, Z Ye, J Ma arXiv preprint arXiv:2411.18932, 2024 | | 2024 |
PyramidCodec: Hierarchical Codec for Long-form Music Generation in Audio Domain J Chen, Z Dai, Z Ye, X Tan, Q Liu, Y Guo, W Xue Findings of the Association for Computational Linguistics: EMNLP 2024, 4253-4263, 2024 | | 2024 |