Llama-moe: Building mixture-of-experts from llama with continual pre-training T Zhu, X Qu, D Dong, J Ruan, J Tong, C He, Y Cheng Proceedings of the 2024 Conference on Empirical Methods in Natural Language …, 2024 | 32 | 2024 |
Predicting Large Language Model Capabilities on Closed-Book QA Tasks Using Only Information Available Prior to Training C Jiang, M Zhang, J Ye, X Fan, Y Cao, J Sun, Z Xi, S Dou, Y Dong, ... arXiv preprint arXiv:2502.04066, 2025 | | 2025 |