Wenet 2.0: More productive end-to-end speech recognition toolkit B Zhang, D Wu, Z Peng, X Song, Z Yao, H Lv, L Xie, C Yang, F Pan, J Niu arXiv preprint arXiv:2203.15455, 2022 | 100 | 2022 |
Speech-XLNet: Unsupervised acoustic model pretraining for self-attention networks X Song, G Wang, Z Wu, Y Huang, D Su, D Yu, H Meng arXiv preprint arXiv:1910.10387, 2019 | 60 | 2019 |
Non-autoregressive transformer asr with ctc-enhanced decoder input X Song, Z Wu, Y Huang, C Weng, D Su, H Meng ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and …, 2021 | 40 | 2021 |
SpecSwap: A Simple Data Augmentation Method for End-to-End Speech Recognition. X Song, Z Wu, Y Huang, D Su, H Meng Interspeech, 581-585, 2020 | 38 | 2020 |
Zeroprompt: streaming acoustic encoders are zero-shot masked lms X Song, D Wu, B Zhang, Z Peng, B Dang, F Pan, Z Wu arXiv preprint arXiv:2305.10649, 2023 | 25 | 2023 |
CB-Conformer: Contextual biasing Conformer for biased word recognition Y Xu, B Liu, Q Huang, X Song, Z Wu, S Kang, H Meng ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and …, 2023 | 15 | 2023 |
Lightgrad: Lightweight diffusion probabilistic model for text-to-speech J Chen, X Song, Z Peng, B Zhang, F Pan, Z Wu ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and …, 2023 | 14 | 2023 |
Multi-level attention based BLSTM neural network for biomedical event extraction X He, L Li, X Song, D Huang, F Ren IEICE TRANSACTIONS on Information and Systems 102 (9), 1842-1850, 2019 | 14 | 2019 |
Fast-u2++: Fast and accurate end-to-end speech recognition in joint ctc/attention frames C Liang, XL Zhang, BB Zhang, D Wu, S Li, X Song, Z Peng, F Pan ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and …, 2023 | 10 | 2023 |
Trimtail: Low-latency streaming asr with simple but effective spectrogram-level length penalty X Song, D Wu, Z Wu, B Zhang, Y Zhang, Z Peng, W Li, F Pan, C Zhu ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and …, 2023 | 8 | 2023 |
U2++ MoE: Scaling 4.7 x parameters with minimal impact on RTF X Song, D Wu, B Zhang, D Zhou, Z Peng, B Dang, F Pan, C Yang arXiv preprint arXiv:2404.16407, 2024 | 5 | 2024 |
Spike-Triggered Contextual Biasing for End-to-End Mandarin Speech Recognition K Huang, A Zhang, B Zhang, T Xu, X Song, L Xie 2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), 1-8, 2023 | 4 | 2023 |
A random gossip BMUF process for neural language modeling Y Huang, J Tian, L Han, G Wang, X Song, D Su, D Yu ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and …, 2020 | 3 | 2020 |
TouchASP: Elastic Automatic Speech Perception that Everyone Can Touch X Song, C Liang, B Zhang, P Zhang, ZY Wang, Y Ma, M Xu, L Wang, ... arXiv preprint arXiv:2412.15622, 2024 | 2 | 2024 |
TouchTTS: An Embarrassingly Simple TTS Framework that Everyone Can Touch X Song, M Xing, C Ma, S Li, D Wu, B Zhang, F Pan, D Zhou, Y Zhang, ... arXiv preprint arXiv:2412.08237, 2024 | 2 | 2024 |
FusionFormer: Fusing Operations in Transformer for Efficient Streaming Speech Recognition X Song, D Wu, B Zhang, Z Wu, W Li, D Li, P Zhang, Z Peng, F Pan, C Zhu, ... arXiv preprint arXiv:2210.17079, 2022 | 2 | 2022 |
Hydraformer: One Encoder for All Subsampling Rates Y Xu, X Song, Z Wu, D Wu, Z Peng, B Zhang 2024 IEEE International Conference on Multimedia and Expo (ICME), 1-6, 2024 | | 2024 |