Folgen
Shuming Ma
Shuming Ma
Microsoft Research Asia
Bestätigte E-Mail-Adresse bei microsoft.com - Startseite
Titel
Zitiert von
Zitiert von
Jahr
Kosmos-2: Grounding multimodal large language models to the world
Z Peng, W Wang, L Dong, Y Hao, S Huang, S Ma, F Wei
arXiv preprint arXiv:2306.14824, 2023
6042023
SGM: sequence generation model for multi-label classification
P Yang, X Sun, W Li, S Ma, W Wu, H Wang
arXiv preprint arXiv:1806.04822, 2018
5012018
Language is not all you need: Aligning perception with language models
S Huang, L Dong, W Wang, Y Hao, S Singhal, S Ma, T Lv, L Cui, ...
Advances in Neural Information Processing Systems 36, 72096-72109, 2023
4782023
Why can gpt learn in-context? language models implicitly perform gradient descent as meta-optimizers
D Dai, Y Sun, L Dong, Y Hao, S Ma, Z Sui, F Wei
arXiv preprint arXiv:2212.10559, 2022
3552022
Retentive network: A successor to transformer for large language models
Y Sun, L Dong, S Huang, S Ma, Y Xia, J Xue, J Wang, F Wei
arXiv preprint arXiv:2307.08621, 2023
3002023
meprop: Sparsified back propagation for accelerated deep learning with reduced overfitting
X Sun, X Ren, S Ma, H Wang
International Conference on Machine Learning, 3299-3308, 2017
2022017
Global encoding for abstractive summarization
J Lin, X Sun, S Ma, Q Su
arXiv preprint arXiv:1805.03989, 2018
2012018
A whole-slide foundation model for digital pathology from real-world data
H Xu, N Usuyama, J Bagga, S Zhang, R Rao, T Naumann, C Wong, ...
Nature, 1-8, 2024
1712024
Deepnet: Scaling transformers to 1,000 layers
H Wang, S Ma, L Dong, S Huang, D Zhang, F Wei
IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024
1712024
The era of 1-bit llms: All large language models are in 1.58 bits
S Ma, H Wang, L Ma, L Wang, W Wang, S Huang, L Dong, R Wang, J Xue, ...
arXiv preprint arXiv:2402.17764, 2024
1662024
Longnet: Scaling transformers to 1,000,000,000 tokens
J Ding, S Ma, L Dong, X Zhang, S Huang, W Wang, N Zheng, F Wei
arXiv preprint arXiv:2307.02486, 2023
1522023
A length-extrapolatable transformer
Y Sun, L Dong, B Patra, S Ma, S Huang, A Benhaim, V Chaudhary, ...
arXiv preprint arXiv:2212.10554, 2022
1492022
XLM-E: Cross-lingual Language Model Pre-training via ELECTRA
Z Chi
arXiv preprint arXiv:2106.16138, 2021
1332021
Language models are general-purpose interfaces
Y Hao, H Song, L Dong, S Huang, Z Chi, W Wang, S Ma, F Wei
arXiv preprint arXiv:2206.06336, 2022
1082022
A simple and effective unified encoder for document-level machine translation
S Ma, D Zhang, M Zhou
Proceedings of the 58th annual meeting of the association for computational …, 2020
1052020
Bitnet: Scaling 1-bit transformers for large language models
H Wang, S Ma, L Dong, S Huang, H Wang, L Ma, F Yang, R Wang, Y Wu, ...
arXiv preprint arXiv:2310.11453, 2023
1012023
Alternating language modeling for cross-lingual pre-training
J Yang, S Ma, D Zhang, S Wu, Z Li, M Zhou
Proceedings of the AAAI Conference on Artificial Intelligence 34 (05), 9386-9393, 2020
952020
mT6: Multilingual Pretrained Text-to-Text Transformer with Translation Pairs
Z Chi
arXiv preprint arXiv:2104.08692, 2021
842021
On the representation collapse of sparse mixture of experts
Z Chi, L Dong, S Huang, D Dai, S Ma, B Patra, S Singhal, P Bajaj, X Song, ...
Advances in Neural Information Processing Systems 35, 34600-34613, 2022
832022
Improving semantic relevance for sequence-to-sequence learning of chinese social media text summarization
S Ma, X Sun, J Xu, H Wang, W Li, Q Su
arXiv preprint arXiv:1706.02459, 2017
832017
Das System kann den Vorgang jetzt nicht ausführen. Versuchen Sie es später erneut.
Artikel 1–20