Add 2022: the first audio deep synthesis detection challenge J Yi, R Fu, J Tao, S Nie, H Ma, C Wang, T Wang, Z Tian, Y Bai, C Fan, ... ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and …, 2022 | 206 | 2022 |
ADD 2023: the Second Audio Deepfake Detection Challenge J Yi, J Tao, R Fu, X Yan, C Wang, T Wang, CY Zhang, X Zhang, Y Zhao, ... IJCAI 2023 Workshop on Deepfake Audio Detection and Analysis (DADA 2023), 2023 | 112 | 2023 |
Half-truth: A partially fake audio detection dataset J Yi, Y Bai, J Tao, Z Tian, C Wang, T Wang, R Fu INTERSPEECH, 2021 | 93 | 2021 |
Gated recurrent fusion with joint training framework for robust end-to-end speech recognition C Fan, J Yi, J Tao, Z Tian, B Liu, Z Wen IEEE/ACM Transactions on Audio, Speech, and Language Processing 29, 198-209, 2020 | 93 | 2020 |
Self-attention transducers for end-to-end speech recognition Z Tian, J Yi, J Tao, Y Bai, Z Wen INTERSPEECH, 2019 | 83 | 2019 |
Synchronous transformers for end-to-end speech recognition Z Tian, J Yi, Y Bai, J Tao, S Zhang, Z Wen ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and …, 2020 | 80 | 2020 |
Self-attention based model for punctuation prediction using word and speech embeddings J Yi, J Tao ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and …, 2019 | 74 | 2019 |
Continuous multimodal emotion prediction based on long short term memory recurrent neural network J Huang, Y Li, J Tao, Z Lian, Z Wen, M Yang, J Yi Proceedings of the 7th Annual Workshop on Audio/Visual Emotion Challenge, 11-18, 2017 | 73 | 2017 |
Fast end-to-end speech recognition via non-autoregressive models and cross-modal knowledge transferring from BERT Y Bai, J Yi, J Tao, Z Tian, Z Wen, S Zhang IEEE/ACM Transactions on Audio, Speech, and Language Processing 29, 1897-1911, 2021 | 71 | 2021 |
Language-adversarial transfer learning for low-resource speech recognition J Yi, J Tao, Z Wen, Y Bai IEEE/ACM Transactions on Audio, Speech, and Language Processing 27 (3), 621-630, 2018 | 70 | 2018 |
Spike-triggered non-autoregressive transformer for end-to-end speech recognition Z Tian, J Yi, J Tao, Y Bai, S Zhang, Z Wen INTERSPEECH, 2020 | 67 | 2020 |
Audio deepfake detection: A survey J Yi, C Wang, J Tao, X Zhang, CY Zhang, Y Zhao arXiv preprint arXiv:2308.14970, 2023 | 64 | 2023 |
Mer 2023: Multi-label learning, modality robustness, and semi-supervised learning Z Lian, H Sun, L Sun, K Chen, M Xu, K Wang, K Xu, Y He, Y Li, J Zhao, ... Proceedings of the 31st ACM International Conference on Multimedia, 9610-9614, 2023 | 58 | 2023 |
CTC regularized model adaptation for improving LSTM RNN based multi-accent mandarin speech recognition J Yi, Z Wen, J Tao, H Ni, B Liu Journal of Signal Processing Systems 90, 985-997, 2018 | 52 | 2018 |
Listen attentively, and spell once: Whole sentence generation via a non-autoregressive architecture for low-latency speech recognition Y Bai, J Yi, J Tao, Z Tian, Z Wen, S Zhang INTERSPEECH, 2020 | 47 | 2020 |
Adversarial transfer learning for punctuation restoration J Yi, J Tao, Y Bai, Z Tian, C Fan arXiv preprint arXiv:2004.00248, 2020 | 45 | 2020 |
CFAD: A Chinese dataset for fake audio detection H Ma, J Yi, C Wang, X Yan, J Tao, T Wang, S Wang, R Fu Speech Communication 164, 103122, 2024 | 44 | 2024 |
Continual learning for fake audio detection H Ma, J Yi, J Tao, Y Bai, Z Tian, C Wang INTERSPEECH, 2021 | 42 | 2021 |
End-to-end post-filter for speech separation with deep attention fusion features C Fan, J Tao, B Liu, J Yi, Z Wen, X Liu IEEE/ACM Transactions on Audio, Speech, and Language Processing 28, 1303-1314, 2020 | 42 | 2020 |
Learn spelling from teachers: Transferring knowledge from language models to sequence-to-sequence speech recognition Y Bai, J Yi, J Tao, Z Tian, Z Wen INTERSPEECH, 2019 | 41 | 2019 |