Fairseq S2T: Fast speech-to-text modeling with fairseq C Wang, Y Tang, X Ma, A Wu, S Popuri, D Okhonko, J Pino arXiv preprint arXiv:2010.05171, 2020 | 275 | 2020 |
Direct speech-to-speech translation with discrete units A Lee, PJ Chen, C Wang, J Gu, S Popuri, X Ma, A Polyak, Y Adi, Q He, ... arXiv preprint arXiv:2107.05604, 2021 | 177 | 2021 |
Textless speech-to-speech translation on real data A Lee, H Gong, PA Duquenne, H Schwenk, PJ Chen, C Wang, S Popuri, ... arXiv preprint arXiv:2112.08352, 2021 | 150 | 2021 |
SeamlessM4T-Massively Multilingual & Multimodal Machine Translation L Barrault, YA Chung, MC Meglioli, D Dale, N Dong, PA Duquenne, ... arXiv preprint arXiv:2308.11596, 2023 | 109 | 2023 |
Seamless: Multilingual Expressive and Streaming Speech Translation L Barrault, YA Chung, MC Meglioli, D Dale, N Dong, M Duppenthaler, ... arXiv preprint arXiv:2312.05187, 2023 | 105 | 2023 |
Enhanced direct speech-to-speech translation using self-supervised pre-training and data augmentation S Popuri, PJ Chen, C Wang, J Pino, Y Adi, J Gu, WN Hsu, A Lee arXiv preprint arXiv:2204.02967, 2022 | 68 | 2022 |
Unity: Two-pass direct speech-to-speech translation with discrete units H Inaguma, S Popuri, I Kulikov, PJ Chen, C Wang, YA Chung, Y Tang, ... arXiv preprint arXiv:2212.08055, 2022 | 47 | 2022 |
SpiRit-LM: Interleaved Spoken and Written Language Model TA Nguyen, B Muller, B Yu, MR Costa-Jussa, M Elbayad, S Popuri, ... Transactions of the Association for Computational Linguistics 13, 30-52, 2025 | 28 | 2025 |
Speech-to-speech translation for a real-world unwritten language PJ Chen, K Tran, Y Yang, J Du, J Kao, YA Chung, P Tomasello, ... arXiv preprint arXiv:2211.06474, 2022 | 21 | 2022 |
Multilingual speech-to-speech translation into multiple target languages H Gong, N Dong, S Popuri, V Goswami, A Lee, J Pino arXiv preprint arXiv:2307.08655, 2023 | 10 | 2023 |
Improving speech-to-speech translation through unlabeled text XP Nguyen, S Popuri, C Wang, Y Tang, I Kulikov, H Gong ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and …, 2023 | 10 | 2023 |
Law of the weakest link: Cross capabilities of large language models M Zhong, A Zhang, X Wang, R Hou, W Xiong, C Zhu, Z Chen, L Tan, C Bi, ... arXiv preprint arXiv:2409.19951, 2024 | 4 | 2024 |
Investigating decoder-only large language models for speech-to-text translation CW Huang, H Lu, H Gong, H Inaguma, I Kulikov, R Mavlyutov, S Popuri arXiv preprint arXiv:2407.03169, 2024 | 4 | 2024 |
COLLD: Contrastive Layer-to-Layer Distillation for Compressing Multilingual Pre-Trained Speech Encoders HJ Chang, N Dong, R Mavlyutov, S Popuri, YA Chung ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and …, 2024 | 4 | 2024 |
MSLM-S2ST: A Multitask Speech Language Model for Textless Speech-to-Speech Translation with Speaker Style Preservation Y Peng, I Kulikov, Y Yang, S Popuri, H Lu, C Wang, H Gong arXiv preprint arXiv:2403.12408, 2024 | 3 | 2024 |
Characterizing and efficiently accelerating multimodal generation model inference Y Lee, A Sun, B Hosmer, B Acun, C Balioglu, C Wang, CD Hernandez, ... arXiv preprint arXiv:2410.00215, 2024 | 2 | 2024 |
Exploring Speech Enhancement for Low-resource Speech Synthesis Z Ni, S Popuri, N Dong, K Saijo, X Zhang, GL Lan, Y Shi, V Chandra, ... arXiv preprint arXiv:2309.10795, 2023 | 1 | 2023 |
Massively Multilingual Forced Aligner Leveraging Self-Supervised Discrete Units H Inaguma, I Kulikov, Z Ni, S Popuri, P Tomasello 2024 IEEE Spoken Language Technology Workshop (SLT), 899-905, 2024 | | 2024 |
An Empirical Study of Speech Language Models for Prompt-Conditioned Speech Synthesis Y Peng, I Kulikov, Y Yang, S Popuri, H Lu, C Wang, H Gong arXiv preprint arXiv:2403.12402, 2024 | | 2024 |