The SJTU robust anti-spoofing system for the ASVspoof 2019 challenge. Y Yang, H Wang, H Dinkel, Z Chen, S Wang, Y Qian, K Yu Interspeech, 1038-1042, 2019 | 61 | 2019 |
Cosyvoice: A scalable multilingual zero-shot text-to-speech synthesizer based on supervised semantic tokens Z Du, Q Chen, S Zhang, K Hu, H Lu, Y Yang, H Hu, S Zheng, Y Gu, Z Ma, ... arXiv preprint arXiv:2407.05407, 2024 | 54 | 2024 |
Data augmentation using deep generative models for embedding based speaker recognition S Wang, Y Yang, Z Wu, Y Qian, K Yu IEEE/ACM Transactions on Audio, Speech, and Language Processing 28, 2598-2609, 2020 | 53 | 2020 |
Revisiting the statistics pooling layer in deep speaker embedding learning S Wang, Y Yang, Y Qian, K Yu 2021 12th International Symposium on Chinese Spoken Language Processing …, 2021 | 50 | 2021 |
Knowledge distillation for small foot-print deep speaker embedding S Wang, Y Yang, T Wang, Y Qian, K Yu ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and …, 2019 | 42 | 2019 |
Aispeech-sjtu accent identification system for the accented english speech recognition challenge H Huang, X Xiang, Y Yang, R Ma, Y Qian ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and …, 2021 | 32 | 2021 |
Funaudiollm: Voice understanding and generation foundation models for natural interaction between humans and llms K An, Q Chen, C Deng, Z Du, C Gao, Z Gao, Y Gu, T He, H Hu, K Hu, S Ji, ... arXiv preprint arXiv:2407.04051, 2024 | 25 | 2024 |
Generative adversarial networks based x-vector augmentation for robust probabilistic linear discriminant analysis in speaker verification Y Yang, S Wang, M Sun, Y Qian, K Yu 2018 11th International Symposium on Chinese Spoken Language Processing …, 2018 | 21 | 2018 |
SeACo-Paraformer: A non-autoregressive ASR system with flexible and effective hotword customization ability X Shi, Y Yang, Z Li, Y Chen, Z Gao, S Zhang ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and …, 2024 | 13 | 2024 |
Text adaptation for speaker verification with speaker-text factorized embeddings Y Yang, S Wang, X Gong, Y Qian, K Yu ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and …, 2020 | 13 | 2020 |
Speaker embedding augmentation with noise distribution matching X Gong, Z Chen, Y Yang, S Wang, L Wang, Y Qian 2021 12th International Symposium on Chinese Spoken Language Processing …, 2021 | 3 | 2021 |
MinMo: A Multimodal Large Language Model for Seamless Voice Interaction Q Chen, Y Chen, Y Chen, M Chen, Y Chen, C Deng, Z Du, R Gao, C Gao, ... arXiv preprint arXiv:2501.06282, 2025 | 1 | 2025 |
CosyVoice 2: Scalable Streaming Speech Synthesis with Large Language Models Z Du, Y Wang, Q Chen, X Shi, X Lv, T Zhao, Z Gao, Y Yang, C Gao, ... arXiv preprint arXiv:2412.10117, 2024 | 1 | 2024 |