Kosmos-2: Grounding multimodal large language models to the world Z Peng, W Wang, L Dong, Y Hao, S Huang, S Ma, F Wei arXiv preprint arXiv:2306.14824, 2023 | 617 | 2023 |
SGM: sequence generation model for multi-label classification P Yang, X Sun, W Li, S Ma, W Wu, H Wang arXiv preprint arXiv:1806.04822, 2018 | 502 | 2018 |
Language is not all you need: Aligning perception with language models S Huang, L Dong, W Wang, Y Hao, S Singhal, S Ma, T Lv, L Cui, ... Advances in Neural Information Processing Systems 36, 72096-72109, 2023 | 483 | 2023 |
Why can gpt learn in-context? language models implicitly perform gradient descent as meta-optimizers D Dai, Y Sun, L Dong, Y Hao, S Ma, Z Sui, F Wei arXiv preprint arXiv:2212.10559, 2022 | 383 | 2022 |
Retentive network: A successor to transformer for large language models Y Sun, L Dong, S Huang, S Ma, Y Xia, J Xue, J Wang, F Wei arXiv preprint arXiv:2307.08621, 2023 | 326 | 2023 |
Global encoding for abstractive summarization J Lin, X Sun, S Ma, Q Su arXiv preprint arXiv:1805.03989, 2018 | 201 | 2018 |
meprop: Sparsified back propagation for accelerated deep learning with reduced overfitting X Sun, X Ren, S Ma, H Wang International Conference on Machine Learning, 3299-3308, 2017 | 200 | 2017 |
A whole-slide foundation model for digital pathology from real-world data H Xu, N Usuyama, J Bagga, S Zhang, R Rao, T Naumann, C Wong, ... Nature 630 (8015), 181-188, 2024 | 196 | 2024 |
Bitnet: Scaling 1-bit transformers for large language models H Wang, S Ma, L Dong, S Huang, H Wang, L Ma, F Yang, R Wang, Y Wu, ... arXiv preprint arXiv:2310.11453, 2023 | 183 | 2023 |
Deepnet: Scaling transformers to 1,000 layers H Wang, S Ma, L Dong, S Huang, D Zhang, F Wei IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024 | 178 | 2024 |
The era of 1-bit llms: All large language models are in 1.58 bits S Ma, H Wang, L Ma, L Wang, W Wang, S Huang, L Dong, R Wang, J Xue, ... arXiv preprint arXiv:2402.17764 1, 2024 | 175 | 2024 |
Longnet: Scaling transformers to 1,000,000,000 tokens J Ding, S Ma, L Dong, X Zhang, S Huang, W Wang, N Zheng, F Wei arXiv preprint arXiv:2307.02486, 2023 | 164 | 2023 |
A length-extrapolatable transformer Y Sun, L Dong, B Patra, S Ma, S Huang, A Benhaim, V Chaudhary, ... arXiv preprint arXiv:2212.10554, 2022 | 154 | 2022 |
XLM-E: Cross-lingual language model pre-training via ELECTRA Z Chi, S Huang, L Dong, S Ma, B Zheng, S Singhal, P Bajaj, X Song, ... arXiv preprint arXiv:2106.16138, 2021 | 132 | 2021 |
Language models are general-purpose interfaces Y Hao, H Song, L Dong, S Huang, Z Chi, W Wang, S Ma, F Wei arXiv preprint arXiv:2206.06336, 2022 | 105 | 2022 |
A simple and effective unified encoder for document-level machine translation S Ma, D Zhang, M Zhou Proceedings of the 58th annual meeting of the association for computational …, 2020 | 104 | 2020 |
Subhojit Som, Xia Song, and Furu Wei S Huang, L Dong, W Wang, Y Hao, S Singhal, S Ma, T Lv, L Cui, ... Language is not all you need: Aligning perception with language models …, 2023 | 96 | 2023 |
Alternating language modeling for cross-lingual pre-training J Yang, S Ma, D Zhang, S Wu, Z Li, M Zhou Proceedings of the AAAI Conference on Artificial Intelligence 34 (05), 9386-9393, 2020 | 95 | 2020 |
On the representation collapse of sparse mixture of experts Z Chi, L Dong, S Huang, D Dai, S Ma, B Patra, S Singhal, P Bajaj, X Song, ... Advances in Neural Information Processing Systems 35, 34600-34613, 2022 | 90 | 2022 |
mT6: Multilingual pretrained text-to-text transformer with translation pairs Z Chi, L Dong, S Ma, SHXL Mao, H Huang, F Wei arXiv preprint arXiv:2104.08692, 2021 | 84 | 2021 |