Exploring the limits of transfer learning with a unified text-to-text transformer (2019) C Raffel, N Shazeer, A Roberts, K Lee, S Narang, M Matena, Y Zhou, W Li, ... arXiv preprint arXiv:1910.10683, 2020 | 21424* | 2020 |
Palm 2 technical report R Anil, AM Dai, O Firat, M Johnson, D Lepikhin, A Passos, S Shakeri, ... arXiv preprint arXiv:2305.10403, 2023 | 1558 | 2023 |
A streaming on-device end-to-end model surpassing server-side conventional model quality and latency TN Sainath, Y He, B Li, A Narayanan, R Pang, A Bruguier, S Chang, W Li, ... ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and …, 2020 | 235 | 2020 |
Monotonic infinite lookback attention for simultaneous machine translation N Arivazhagan, C Cherry, W Macherey, CC Chiu, S Yavuz, R Pang, W Li, ... arXiv preprint arXiv:1906.05218, 2019 | 203 | 2019 |
Two-pass end-to-end speech recognition TN Sainath, R Pang, D Rybach, Y He, R Prabhavalkar, W Li, M Visontai, ... arXiv preprint arXiv:1908.10992, 2019 | 171 | 2019 |
Delving into out-of-distribution detection with vision-language representations Y Ming, Z Cai, J Gu, Y Sun, W Li, Y Li Advances in neural information processing systems 35, 35087-35102, 2022 | 165 | 2022 |
Do transformer modifications transfer across implementations and applications? S Narang, HW Chung, Y Tay, W Fedus, T Fevry, M Matena, K Malkan, ... arXiv preprint arXiv:2102.11972, 2021 | 110 | 2021 |
VoiceFilter-Lite: Streaming targeted voice separation for on-device speech recognition Q Wang, IL Moreno, M Saglam, K Wilson, A Chiao, R Liu, Y He, W Li, ... arXiv preprint arXiv:2009.04323, 2020 | 104 | 2020 |
Streaming small-footprint keyword spotting using sequence-to-sequence models Y He, R Prabhavalkar, K Rao, W Li, A Bakhtin, I McGraw 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU …, 2017 | 104 | 2017 |
Palm 2 technical report. arXiv 2023 R Anil, AM Dai, O Firat, M Johnson, D Lepikhin, A Passos, S Shakeri, ... arXiv preprint arXiv:2305.10403, 0 | 83 | |
Tied & reduced RNN-T decoder R Botros, TN Sainath, R David, E Guzman, W Li, Y He arXiv preprint arXiv:2109.07513, 2021 | 59 | 2021 |
An Efficient Streaming Non-Recurrent On-Device End-to-End Model with Improvements to Rare-Word Modeling. TN Sainath, Y He, A Narayanan, R Botros, R Pang, D Rybach, C Allauzen, ... Interspeech 8, 1777-1781, 2021 | 47 | 2021 |
Video instruction tuning with synthetic data Y Zhang, J Wu, W Li, B Li, Z Ma, Z Liu, C Li arXiv preprint arXiv:2410.02713, 2024 | 44 | 2024 |
Learning word-level confidence for subword end-to-end automatic speech recognition D Qiu, Q Li, Y He, Y Zhang, B Li, L Cao, R Prabhavalkar, D Bhatia, W Li, ... US Patent 11,610,586, 2023 | 33 | 2023 |
Learning word-level confidence for subword end-to-end ASR D Qiu, Q Li, Y He, Y Zhang, B Li, L Cao, R Prabhavalkar, D Bhatia, W Li, ... ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and …, 2021 | 33 | 2021 |
Mammut: A simple architecture for joint learning for multimodal tasks W Kuo, AJ Piergiovanni, D Kim, X Luo, B Caine, W Li, A Ogale, L Zhou, ... arXiv preprint arXiv:2303.16839, 2023 | 27 | 2023 |
Low Latency Speech Recognition Using End-to-End Prefetching. SY Chang, B Li, D Rybach, Y He, W Li, TN Sainath, T Strohman Interspeech, 1962-1966, 2020 | 20 | 2020 |
Answer-me: Multi-task open-vocabulary visual question answering AJ Piergiovanni, W Li, W Kuo, M Saffar, F Bertsch, A Angelova arXiv preprint arXiv:2205.00949, 2022 | 18 | 2022 |
Key phrase spotting W Li, RP Prabhavalkar, KK Rao, Y He, IC McGraw, A Bakhtin US Patent 11,295,739, 2022 | 17 | 2022 |
Parallel rescoring with transformer for streaming on-device speech recognition W Li, J Qin, CC Chiu, R Pang, Y He Interspeech 2020, 2020 | 17 | 2020 |