Neural codec language models are zero-shot text to speech synthesizers C Wang, S Chen, Y Wu, Z Zhang, L Zhou, S Liu, Z Chen, Y Liu, H Wang, ... arXiv preprint arXiv:2301.02111, 2023 | 637 | 2023 |
An introduction to computational networks and the computational network toolkit D Yu, A Eversole, M Seltzer, K Yao, Z Huang, B Guenter, O Kuchaiev, ... Microsoft Technical Report MSR-TR-2014–112, 2014 | 476 | 2014 |
Clap learning audio concepts from natural language supervision B Elizalde, S Deshmukh, M Al Ismail, H Wang ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and …, 2023 | 448 | 2023 |
Speak foreign languages with your own voice: Cross-lingual neural codec language modeling Z Zhang, L Zhou, C Wang, S Chen, Y Wu, S Liu, Z Chen, Y Liu, H Wang, ... arXiv preprint arXiv:2303.03926, 2023 | 161 | 2023 |
Pengi: An audio language model for audio tasks S Deshmukh, B Elizalde, R Singh, H Wang Advances in Neural Information Processing Systems 36, 18090-18108, 2023 | 140 | 2023 |
Advances in online audio-visual meeting transcription T Yoshioka, I Abramovski, C Aksoylar, Z Chen, M David, D Dimitriadis, ... 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU …, 2019 | 90 | 2019 |
Multi-channel speech separation Z Chen, J Li, X Xiao, T Yoshioka, H Wang, Z Wang, Y Gong US Patent 10,839,822, 2020 | 87 | 2020 |
Personalized speech enhancement: New models and comprehensive evaluation SE Eskimez, T Yoshioka, H Wang, X Wang, Z Chen, X Huang ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and …, 2022 | 68 | 2022 |
Natural language supervision for general-purpose audio representations B Elizalde, S Deshmukh, H Wang ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and …, 2024 | 55 | 2024 |
Cracking the cocktail party problem by multi-beam deep attractor network Z Chen, J Li, X Xiao, T Yoshioka, H Wang, Z Wang, Y Gong 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU …, 2017 | 55 | 2017 |
Audio retrieval with wavtext5k and clap training S Deshmukh, B Elizalde, H Wang arXiv preprint arXiv:2209.14275, 2022 | 51 | 2022 |
Fast real-time personalized speech enhancement: End-to-end enhancement network (E3Net) and knowledge distillation M Thakker, SE Eskimez, T Yoshioka, H Wang arXiv preprint arXiv:2204.00771, 2022 | 37 | 2022 |
One model to enhance them all: array geometry agnostic multi-channel personalized speech enhancement H Taherian, SE Eskimez, T Yoshioka, H Wang, Z Chen, X Huang ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and …, 2022 | 28 | 2022 |
Human listening and live captioning: Multi-task training for speech enhancement SE Eskimez, X Wang, M Tang, H Yang, Z Zhu, Z Chen, H Wang, ... arXiv preprint arXiv:2106.02896, 2021 | 26 | 2021 |
An introduction to computational networks and the computational network toolkit Y Dong, E Adam, S Mike, Y Kaisheng, H Zhi-Heng, G Brian, K Oleksii, ... Tech. Rep. MSR-TR-2014-112, 2014 | 24 | 2014 |
Notsofar-1 challenge: New datasets, baseline, and tasks for distant meeting transcription A Vinnikov, A Ivry, A Hurvitz, I Abramovski, S Koubi, I Gurvich, S Peer, ... arXiv preprint arXiv:2401.08887, 2024 | 20 | 2024 |
An overview of microsoft deep qa system on stanford webquestions benchmark Z Wang, S Yan, H Wang, X Huang 2018-09-15]. https://www. microsoft. com/en-us/research/publication/an …, 2014 | 19 | 2014 |
Training audio captioning models without audio S Deshmukh, B Elizalde, D Emmanouilidou, B Raj, R Singh, H Wang ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and …, 2024 | 18 | 2024 |
Artificial intelligence system utilizing microphone array and fisheye camera Z Wang, X Huang, L Qin, K Wu, H Wang US Patent App. 15/885,518, 2019 | 18 | 2019 |
Online verification of custom wake word K Shahid, K Kumar, T Yi, V Miljanic, H Wang, Y Gong, HA Khalil US Patent 11,158,305, 2021 | 13 | 2021 |