The accented english speech recognition challenge 2020: open datasets, tracks, baselines, results and methods X Shi, F Yu, Y Lu, Y Liang, Q Feng, D Wang, Y Qian, L Xie ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and …, 2021 | 79 | 2021 |
The asru 2019 mandarin-english code-switching speech recognition challenge: Open datasets, tracks, methods and results X Shi, Q Feng, L Xie arXiv preprint arXiv:2007.05916, 2020 | 57* | 2020 |
Funasr: A fundamental end-to-end speech recognition toolkit Z Gao, Z Li, J Wang, H Luo, X Shi, M Chen, Y Li, L Zuo, Z Du, Z Xiao, ... arXiv preprint arXiv:2305.11013, 2023 | 52 | 2023 |
Cascade rnn-transducer: Syllable based streaming on-device mandarin speech recognition with a syllable-to-character converter X Wang, Z Yao, X Shi, L Xie 2021 IEEE Spoken Language Technology Workshop (SLT), 15-21, 2021 | 32 | 2021 |
Funaudiollm: Voice understanding and generation foundation models for natural interaction between humans and llms K An, Q Chen, C Deng, Z Du, C Gao, Z Gao, Y Gu, T He, H Hu, K Hu, S Ji, ... arXiv preprint arXiv:2407.04051, 2024 | 25 | 2024 |
Efficient gradient-based neural architecture search for end-to-end ASR X Shi, P Zhou, W Chen, L Xie Companion Publication of the 2021 International Conference on Multimodal …, 2021 | 21* | 2021 |
SeACo-Paraformer: A Non-Autoregressive ASR System with Flexible and Effective Hotword Customization Ability X Shi, Y Yang, Z Li, Y Chen, Z Gao, S Zhang arXiv preprint arXiv:2308.03266, 2023 | 13 | 2023 |
Achieving timestamp prediction while recognizing with non-autoregressive end-to-end asr model X Shi, Y Chen, S Zhang, Z Yan National Conference on Man-Machine Speech Communication, 89-100, 2022 | 9 | 2022 |
BAT: Boundary aware transducer for memory-efficient and low-latency ASR K An, X Shi, S Zhang arXiv preprint arXiv:2305.11571, 2023 | 8 | 2023 |
Linguistic-acoustic similarity based accent shift for accent recognition Q Shao, J Yan, J Kang, P Guo, X Shi, P Hu, L Xie arXiv preprint arXiv:2204.03398, 2022 | 8 | 2022 |
SlideSpeech: A Large Scale Slide-Enriched Audio-Visual Corpus H Wang, F Yu, X Shi, Y Wang, S Zhang, M Li ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and …, 2024 | 7 | 2024 |
LCB-Net: Long-Context Biasing for Audio-Visual Speech Recognition F Yu, H Wang, X Shi, S Zhang ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and …, 2024 | 2 | 2024 |
MinMo: A Multimodal Large Language Model for Seamless Voice Interaction Q Chen, Y Chen, Y Chen, M Chen, Y Chen, C Deng, Z Du, R Gao, C Gao, ... arXiv preprint arXiv:2501.06282, 2025 | 1 | 2025 |
CosyVoice 2: Scalable Streaming Speech Synthesis with Large Language Models Z Du, Y Wang, Q Chen, X Shi, X Lv, T Zhao, Z Gao, Y Yang, C Gao, ... arXiv preprint arXiv:2412.10117, 2024 | 1 | 2024 |
Accurate and Reliable Confidence Estimation Based on Non-Autoregressive End-to-End Speech Recognition System X Shi, H Luo, Z Gao, S Zhang, Z Yan arXiv preprint arXiv:2305.10680, 2023 | 1 | 2023 |