PanGu-: Large-scale Autoregressive Pretrained Chinese Language Models with Auto-parallel Computation W Zeng, X Ren, T Su, H Wang, Y Liao, Z Wang, X Jiang, ZZ Yang, K Wang, ... arXiv preprint arXiv:2104.12369, 2021 | 252 | 2021 |
Nezha: Neural contextualized representation for chinese language understanding J Wei, X Ren, X Li, W Huang, Y Liao, Y Wang, J Lin, X Jiang, X Chen, ... arXiv preprint arXiv:1909.00204, 2019 | 148 | 2019 |
Pixart-\sigma: Weak-to-strong training of diffusion transformer for 4k text-to-image generation J Chen, C Ge, E Xie, Y Wu, L Yao, X Ren, Z Wang, P Luo, H Lu, Z Li ECCV 2024, 2024 | 121 | 2024 |
PanGu-: Towards Trillion Parameter Language Model with Sparse Heterogeneous Computing X Ren, P Zhou, X Meng, X Huang, Y Wang, W Wang, P Li, X Zhang, ... arXiv preprint arXiv:2303.10845, 2023 | 88 | 2023 |
Response length perception and sequence scheduling: An llm-empowered llm inference pipeline Z Zheng, X Ren, F Xue, Y Luo, X Jiang, Y You Advances in Neural Information Processing Systems 36, 65517-65530, 2023 | 54 | 2023 |
A survey of reasoning with foundation models J Sun, C Zheng, E Xie, Z Liu, R Chu, J Qiu, J Xu, M Ding, H Li, M Geng, ... arXiv preprint arXiv:2312.11562, 2023 | 51 | 2023 |
Sparsebert: Rethinking the importance analysis in self-attention H Shi, J Gao, X Ren, H Xu, X Liang, Z Li, JTY Kwok International Conference on Machine Learning, 9547-9557, 2021 | 51 | 2021 |
Autobert-zero: Evolving bert backbone from scratch J Gao, H Xu, H Shi, X Ren, LH Philip, X Liang, X Jiang, Z Li Proceedings of the AAAI Conference on Artificial Intelligence 36 (10), 10663 …, 2022 | 42 | 2022 |
Efficientbert: Progressively searching multilayer perceptron via warm-up knowledge distillation C Dong, G Wang, H Xu, J Peng, X Ren, X Liang Findings of the Association for Computational Linguistics: EMNLP 2021, 2021 | 24 | 2021 |
Numgpt: Improving numeracy ability of generative pre-trained models Z Jin, X Jiang, X Wang, Q Liu, Y Wang, X Ren, H Qu arXiv preprint arXiv:2109.03137, 2021 | 22 | 2021 |
Came: Confidence-guided adaptive memory efficient optimization Y Luo, X Ren, Z Zheng, Z Jiang, X Jiang, Y You ACL2023 - Outstanding Paper Award, 2023 | 20 | 2023 |
Schemoe: An extensible mixture-of-experts distributed training system with tasks scheduling S Shi, X Pan, Q Wang, C Liu, X Ren, Z Hu, Y Yang, B Li, X Chu Proceedings of the Nineteenth European Conference on Computer Systems, 236-249, 2024 | 18 | 2024 |
One student knows all experts know: From sparse to dense F Xue, X He, X Ren, Y Lou, Y You arXiv preprint arXiv:2201.10890, 2022 | 18 | 2022 |
Large-scale deep learning optimizations: A comprehensive survey X He, F Xue, X Ren, Y You arXiv preprint arXiv:2111.00856, 2021 | 18 | 2021 |
Edgefm: Leveraging foundation model for open-set learning on the edge B Yang, L He, N Ling, Z Yan, G Xing, X Shuai, X Ren, X Jiang Proceedings of the 21st ACM Conference on Embedded Networked Sensor Systems …, 2023 | 16 | 2023 |
Pangu-α: Large-scale autoregressive pretrained Chinese language models with auto-parallel computation. arXiv 2021 W Zeng, X Ren, T Su, H Wang, Y Liao, Z Wang, X Jiang, Z Yang, K Wang, ... arXiv preprint arXiv:2104.12369, 0 | 16 | |
PanGu W Zeng, X Ren, T Su, H Wang, Y Liao, Z Wang, X Jiang, ZZ Yang, K Wang, ... Large-scale Autoregressive Pretrained Chinese Language Models with Auto …, 2021 | 9 | 2021 |
A study on transformer configuration and training objective F Xue, J Chen, A Sun, X Ren, Z Zheng, X He, Y Chen, X Jiang, Y You International Conference on Machine Learning, 38913-38925, 2023 | 7 | 2023 |
DAPE: Data-Adaptive Positional Encoding for Length Extrapolation C Zheng, Y Gao, H Shi, M Huang, J Li, J Xiong, X Ren, M Ng, X Jiang, Z Li, ... Advances in Neural Information Processing Systems 37, 26659-26700, 2025 | 5 | 2025 |
Deeper vs wider: A revisit of transformer configuration F Xue, J Chen, A Sun, X Ren, Z Zheng, X He, X Jiang, Y You arXiv preprint arXiv:2205.10505 2 (3), 2022 | 5 | 2022 |