Paraformer: Fast and accurate parallel transformer for non-autoregressive end-to-end speech recognition Z Gao, S Zhang, I McLoughlin, Z Yan arXiv preprint arXiv:2206.08317, 2022 | 102 | 2022 |
Improving Aggregation and Loss Function for Better Embedding Learning in End-to-End Speaker Verification System Z Gao, Y Song, IV McLoughlin, P Li, Y Jiang, LR Dai INTERSPEECH 2019, 361-365, 2019 | 88 | 2019 |
emotion2vec: Self-supervised pre-training for speech emotion representation Z Ma, Z Zheng, J Ye, J Li, Z Gao, S Zhang, X Chen arXiv preprint arXiv:2312.15185, 2023 | 82 | 2023 |
Lauragpt: Listen, attend, understand, and regenerate audio with gpt Z Du, J Wang, Q Chen, Y Chu, Z Gao, Z Li, K Hu, X Zhou, J Xu, Z Ma, ... arXiv preprint arXiv:2310.04673, 2023 | 68 | 2023 |
Cosyvoice: A scalable multilingual zero-shot text-to-speech synthesizer based on supervised semantic tokens Z Du, Q Chen, S Zhang, K Hu, H Lu, Y Yang, H Hu, S Zheng, Y Gu, Z Ma, ... arXiv preprint arXiv:2407.05407, 2024 | 63 | 2024 |
FunASR: A Fundamental End-to-End Speech Recognition Toolkit Z Gao, Z Li, J Wang, H Luo, X Shi, M Chen, Y Li, L Zuo, Z Du, Z Xiao, ... INERSPEECH 2023, 2023 | 53 | 2023 |
San-m: Memory equipped self-attention for end-to-end speech recognition Z Gao, S Zhang, M Lei, I McLoughlin INTERSPEECH 2020, 6-10, 2020 | 36 | 2020 |
An Effective Deep Embedding Learning Architecture for Speaker Verification Y Jiang, Y Song, IV McLoughlin, Z Gao, LR Dai INTERSPEECH 2019, 4040-4044, 2019 | 36 | 2019 |
Streaming chunk-aware multihead attention for online end-to-end speech recognition S Zhang, Z Gao, H Luo, M Lei, J Gao, Z Yan, L Xie INTERSPEECH 2020, 2142-2146, 2020 | 32 | 2020 |
An improved deep embedding learning method for short duration speaker verification Z Gao, Y Song, IV McLoughlin, W Guo, LR Dai INTERSPEECH 2018, 3578-3582, 2018 | 32 | 2018 |
An embarrassingly simple approach for LLM with strong ASR capacity Z Ma, G Yang, Y Yang, Z Gao, J Wang, Z Du, F Yu, Q Chen, S Zheng, ... arXiv preprint arXiv:2402.08846, 2024 | 26 | 2024 |
Funaudiollm: Voice understanding and generation foundation models for natural interaction between humans and llms K An, Q Chen, C Deng, Z Du, C Gao, Z Gao, Y Gu, T He, H Hu, K Hu, S Ji, ... arXiv preprint arXiv:2407.04051, 2024 | 25 | 2024 |
Extremely Low Footprint End-to-End ASR System for Smart Device Z Gao, Y Yao, S Zhang, J Yang, M Lei, I McLoughlin INTERSPEECH 2021, 4548-4552, 2021 | 16 | 2021 |
Universal asr: Unifying streaming and non-streaming asr using a single encoder-decoder model Z Gao, S Zhang, M Lei, I McLoughlin arXiv preprint arXiv:2010.14099, 2020 | 16 | 2020 |
Seaco-paraformer: A non-autoregressive asr system with flexible and effective hotword customization ability X Shi, Y Yang, Z Li, Y Chen, Z Gao, S Zhang ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and …, 2024 | 13 | 2024 |
Mala-asr: Multimedia-assisted llm-based asr G Yang, Z Ma, F Yu, Z Gao, S Zhang, X Chen arXiv preprint arXiv:2406.05839, 2024 | 8 | 2024 |
Cosyvoice 2: Scalable streaming speech synthesis with large language models Z Du, Y Wang, Q Chen, X Shi, X Lv, T Zhao, Z Gao, Y Yang, C Gao, ... arXiv preprint arXiv:2412.10117, 2024 | 6 | 2024 |
Minmo: A multimodal large language model for seamless voice interaction Q Chen, Y Chen, Y Chen, M Chen, Y Chen, C Deng, Z Du, R Gao, C Gao, ... arXiv preprint arXiv:2501.06282, 2025 | 2 | 2025 |
Wav2vec‐MoE: An unsupervised pre‐training and adaptation method for multi‐accent ASR Y Lin, S Zhang, Z Gao, L Wang, Y Yang, J Dang Electronics Letters 59 (11), e12823, 2023 | 2 | 2023 |
CTC-Assisted LLM-Based Contextual ASR G Yang, Z Ma, Z Gao, S Zhang, X Chen 2024 IEEE Spoken Language Technology Workshop (SLT), 126-131, 2024 | 1 | 2024 |