Alphazero-like tree-search can guide large language model decoding and training Z Wan, X Feng, M Wen, SM McAleer, Y Wen, W Zhang, J Wang Forty-first International Conference on Machine Learning, 2024 | 83 | 2024 |
Malib: A parallel framework for population-based multi-agent reinforcement learning M Zhou, Z Wan, H Wang, M Wen, R Wu, Y Wen, Y Yang, Y Yu, J Wang, ... Journal of Machine Learning Research 24 (150), 1-12, 2023 | 59 | 2023 |
Neural auto-curricula in two-player zero-sum games X Feng, O Slumbers, Z Wan, B Liu, S McAleer, Y Wen, J Wang, Y Yang Advances in Neural Information Processing Systems 34, 3504-3517, 2021 | 52* | 2021 |
Order matters: Agent-by-agent policy optimization X Wang, Z Tian, Z Wan, Y Wen, J Wang, W Zhang arXiv preprint arXiv:2302.06205, 2023 | 23 | 2023 |
On realization of intelligent decision-making in the real world: A foundation decision model perspective Y Wen, Z Wan, M Zhou, S Hou, Z Cao, C Le, J Chen, Z Tian, W Zhang, ... arXiv preprint arXiv:2212.12669, 2022 | 9 | 2022 |
Openr: An open source framework for advanced reasoning with large language models J Wang, M Fang, Z Wan, M Wen, J Zhu, A Liu, Z Gong, Y Song, L Chen, ... arXiv preprint arXiv:2410.09671, 2024 | 8 | 2024 |
Reinforcing LLM Agents via Policy Optimization with Action Decomposition M Wen, Z Wan, J Wang, W Zhang, Y Wen The Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024 | 7* | 2024 |
Natural language reinforcement learning X Feng, Z Wan, H Fu, B Liu, M Yang, GA Koushik, Z Hu, Y Wen, J Wang arXiv preprint arXiv:2411.14251, 2024 | 2 | 2024 |
Language Games as the Pathway to Artificial Superhuman Intelligence Y Wen, Z Wan, S Zhang arXiv preprint arXiv:2501.18924, 2025 | | 2025 |