Salmonn: Towards generic hearing abilities for large language models C Tang, W Yu, G Sun, X Chen, T Tan, W Li, L Lu, Z Ma, C Zhang arXiv preprint arXiv:2310.13289, 2023 | 219 | 2023 |
Improving non-native mispronunciation detection and enriching diagnostic feedback with DNN-based speech attribute modeling W Li, SM Siniscalchi, NF Chen, CH Lee 2016 IEEE international conference on acoustics, speech and signal …, 2016 | 109 | 2016 |
Llava-next-interleave: Tackling multi-image, video, and 3d in large multimodal models F Li, R Zhang, H Zhang, Y Zhang, B Li, W Li, Z Ma, C Li arXiv preprint arXiv:2407.07895, 2024 | 100 | 2024 |
Connecting speech encoder and large language model for asr W Yu, C Tang, G Sun, X Chen, T Tan, W Li, L Lu, Z Ma, C Zhang ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and …, 2024 | 45 | 2024 |
Improving Mispronunciation Detection for Non-Native Learners with Multisource Information and LSTM-Based Deep Models. W Li, NF Chen, SM Siniscalchi, CH Lee Interspeech, 2759-2763, 2017 | 44 | 2017 |
Video instruction tuning with synthetic data Y Zhang, J Wu, W Li, B Li, Z Ma, Z Liu, C Li arXiv preprint arXiv:2410.02713, 2024 | 38 | 2024 |
Detecting Mispronunciations of L2 Learners and Providing Corrective Feedback Using Knowledge-Guided and Data-Driven Decision Trees. W Li, K Li, SM Siniscalchi, NF Chen, CH Lee Interspeech 2016, 3127-3131, 2016 | 33 | 2016 |
Improving mandarin tone recognition based on dnn by combining acoustic and articulatory features using extended recognition networks J Lin, W Li, Y Gao, Y Xie, NF Chen, SM Siniscalchi, J Zhang, CH Lee Journal of Signal Processing Systems 90, 1077-1087, 2018 | 30 | 2018 |
Improving mispronunciation detection of mandarin tones for non-native learners with soft-target tone labels and BLSTM-based deep tone models W Li, NF Chen, SM Siniscalchi, CH Lee IEEE/ACM Transactions on Audio, Speech, and Language Processing 27 (12 …, 2019 | 29 | 2019 |
A cross-task transfer learning approach to adapting deep speech enhancement models to unseen background noise using paired senone classifiers S Wang, W Li, SM Siniscalchi, CH Lee ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and …, 2020 | 24 | 2020 |
A study on functional loads of phonetic contrasts under context based on mutual information of Chinese text and phonemes J Zhang, W Li, Y Hou, W Cao, Z Xiong 2010 7th International Symposium on Chinese Spoken Language Processing, 194-198, 2010 | 23 | 2010 |
Improving audio-visual speech recognition performance with cross-modal student-teacher training W Li, S Wang, M Lei, SM Siniscalchi, CH Lee ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and …, 2019 | 21 | 2019 |
Improving accent conversion with reference encoder and end-to-end text-to-speech W Li, B Tang, X Yin, Y Zhao, W Li, K Wang, H Huang, Y Wang, Z Ma arXiv preprint arXiv:2005.09271, 2020 | 15 | 2020 |
video-salmonn: Speech-enhanced audio-visual large language models G Sun, W Yu, C Tang, X Chen, T Tan, W Li, L Lu, Z Ma, Y Wang, C Zhang arXiv preprint arXiv:2406.15704, 2024 | 14 | 2024 |
Fine-grained audio-visual joint representations for multimodal large language models G Sun, W Yu, C Tang, X Chen, T Tan, W Li, L Lu, Z Ma, C Zhang arXiv preprint arXiv:2310.05863, 2023 | 11 | 2023 |
An ASR-free fluency scoring approach with self-supervised learning W Liu, K Fu, X Tian, S Shi, W Li, Z Ma, T Lee ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and …, 2023 | 11 | 2023 |
Improving non-native word-level pronunciation scoring with phone-level mixup data augmentation and multi-source information K Fu, S Gao, K Wang, W Li, X Tian, Z Ma arXiv preprint arXiv:2203.01826, 2022 | 10 | 2022 |
A transfer and multi-task learning based approach for MOS prediction X Tian, K Fu, S Gao, Y Gu, K Wang, W Li, Z Ma Proc. Interspeech 2022, 5438-5442, 2022 | 10 | 2022 |
Improving mandarin tone mispronunciation detection for non-native learners with soft-target tone labels and blstm-based deep models W Li, NF Chen, SM Siniscalchi, CH Lee 2018 IEEE International Conference on Acoustics, Speech and Signal …, 2018 | 9 | 2018 |
Using Fluency Representation Learned from Sequential Raw Features for Improving Non-native Fluency Scoring. K Fu, S Gao, X Tian, W Li, Z Ma, A Bytedance INTERSPEECH, 4337-4341, 2022 | 8 | 2022 |