Deepseek llm: Scaling open-source language models with longtermism X Bi, D Chen, G Chen, S Chen, D Dai, C Deng, H Ding, K Dong, Q Du, ... arXiv preprint arXiv:2401.02954, 2024 | 228 | 2024 |
Deepseekmoe: Towards ultimate expert specialization in mixture-of-experts language models D Dai, C Deng, C Zhao, RX Xu, H Gao, D Chen, J Li, W Zeng, X Yu, Y Wu, ... arXiv preprint arXiv:2401.06066, 2024 | 180 | 2024 |
Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning D Guo, D Yang, H Zhang, J Song, R Zhang, R Xu, Q Zhu, S Ma, P Wang, ... arXiv preprint arXiv:2501.12948, 2025 | 174 | 2025 |
Deepseek-v3 technical report A Liu, B Feng, B Xue, B Wang, B Wu, C Lu, C Zhao, C Deng, C Zhang, ... arXiv preprint arXiv:2412.19437, 2024 | 145 | 2024 |
Deepseek-coder-v2: Breaking the barrier of closed-source models in code intelligence Q Zhu, D Guo, Z Shao, D Yang, P Wang, R Xu, Y Wu, Y Li, H Gao, S Ma, ... arXiv preprint arXiv:2406.11931, 2024 | 134 | 2024 |
Deepseek-v2: A strong, economical, and efficient mixture-of-experts language model A Liu, B Feng, B Wang, B Wang, B Liu, C Zhao, C Dengr, C Ruan, D Dai, ... arXiv preprint arXiv:2405.04434, 2024 | 132 | 2024 |
Deepseek-v3 technical report AL DeepSeek-AI, B Feng, B Xue, B Wang, B Wu, C Lu, C Zhao, C Deng, ... arXiv preprint arXiv:2412.19437, 4, 2024 | 10 | 2024 |
Auxiliary-loss-free load balancing strategy for mixture-of-experts L Wang, H Gao, C Zhao, X Sun, D Dai arXiv preprint arXiv:2408.15664, 2024 | 4 | 2024 |
Critique of “Planetary Normal Mode Computation: Parallel Algorithms, Performance, and Reproducibility” by SCC Team From Tsinghua University C Zhang, C Zhao, J He, S Chen, L Zheng, K Huang, W Han, J Zhai IEEE Transactions on Parallel and Distributed Systems 32 (11), 2631-2634, 2021 | 2 | 2021 |
Fire-Flyer AI-HPC: A Cost-Effective Software-Hardware Co-Design for Deep Learning W An, X Bi, G Chen, S Chen, C Deng, H Ding, K Dong, Q Du, W Gao, ... SC24: International Conference for High Performance Computing, Networking …, 2024 | 1 | 2024 |
Canvas: End-to-End Kernel Architecture Search in Neural Networks C Zhao, G Zhang, M Gao arXiv preprint arXiv:2304.07741, 2023 | 1 | 2023 |
Student Cluster Competition 2018, Team Tsinghua University: Reproducing performance of multi-physics simulations of the Tsunamigenic 2004 Sumatra megathrust earthquake on the … J He, C Zhao, J Yu, X Yu, L Zheng, C Lou, S Tang, W Han, J Zhai Parallel Computing 90, 102570, 2019 | | 2019 |