Deepnet: Scaling transformers to 1,000 layers H Wang, S Ma, L Dong, S Huang, D Zhang, F Wei IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024 | 173 | 2024 |
The era of 1-bit llms: All large language models are in 1.58 bits S Ma, H Wang, L Ma, L Wang, W Wang, S Huang, L Dong, R Wang, J Xue, ... arXiv preprint arXiv:2402.17764, 2024 | 167 | 2024 |
Bitnet: Scaling 1-bit transformers for large language models H Wang, S Ma, L Dong, S Huang, H Wang, L Ma, F Yang, R Wang, Y Wu, ... arXiv preprint arXiv:2310.11453, 2023 | 103 | 2023 |
Magneto: A Foundation Transformer Hongyu Wang, Shuming Ma, Shaohan Huang, Li Dong, Wenhui Wang, Zhiliang Peng ... International Conference on Machine Learning, 2023 | 44* | 2023 |
TorchScale: Transformers at scale S Ma, H Wang, S Huang, W Wang, Z Chi, L Dong, A Benhaim, B Patra, ... arXiv preprint arXiv:2211.13184, 2022 | 16 | 2022 |
Q-sparse: All large language models can be fully sparsely-activated H Wang, S Ma, R Wang, F Wei arXiv preprint arXiv:2407.10969, 2024 | 5 | 2024 |
M4U: Evaluating Multilingual Understanding and Reasoning for Large Multimodal Models H Wang, J Xu, S Xie, R Wang, J Li, Z Xie, B Zhang, C Xiong, X Chen arXiv preprint arXiv:2405.15638, 2024 | 3 | 2024 |
1-bit AI Infra: Part 1.1, Fast and Lossless BitNet b1. 58 Inference on CPUs J Wang, H Zhou, T Song, S Mao, S Ma, H Wang, Y Xia, F Wei arXiv preprint arXiv:2410.16144, 2024 | 1 | 2024 |
Robotic Programmer: Video Instructed Policy Code Generation for Robotic Manipulation S Xie, H Wang, Z Xiao, R Wang, X Chen arXiv preprint arXiv:2501.04268, 2025 | | 2025 |
BitNet a4. 8: 4-bit Activations for 1-bit LLMs H Wang, S Ma, F Wei arXiv preprint arXiv:2411.04965, 2024 | | 2024 |
Transformer network with normalization including scaling parameter Shuming MA, Li Dong, Shaohan Huang, Dongdong Zhang, Furu Wei, Hongyu Wang US Patent App. 18/176,037, 2024 | | 2024 |