关注
Muning Wen
Muning Wen
在 sjtu.edu.cn 的电子邮件经过验证
标题
引用次数
引用次数
年份
Trust region policy optimisation in multi-agent reinforcement learning
JG Kuba, R Chen, M Wen, Y Wen, F Sun, J Wang, Y Yang
10th International Conference on Learning Representations, 2021
2702021
Multi-agent reinforcement learning is a sequence modeling problem
M Wen, J Kuba, R Lin, W Zhang, Y Wen, J Wang, Y Yang
Advances in Neural Information Processing Systems 35, 16509-16521, 2022
1982022
Offline pre-trained multi-agent decision transformer
L Meng, M Wen, C Le, X Li, D Xing, W Zhang, Y Wen, H Zhang, J Wang, ...
Machine Intelligence Research 20 (2), 233-248, 2023
109*2023
Alphazero-like tree-search can guide large language model decoding and training
X Feng, Z Wan, M Wen, Y Wen, W Zhang, J Wang
ICML 2024, 2023
822023
Settling the variance of multi-agent policy gradients
JG Kuba, M Wen, L Meng, H Zhang, D Mguni, J Wang, Y Yang
Advances in Neural Information Processing Systems 34, 13458-13470, 2021
702021
Malib: A parallel framework for population-based multi-agent reinforcement learning
M Zhou, Z Wan, H Wang, M Wen, R Wu, Y Wen, Y Yang, W Zhang, ...
JMLR, 2021
592021
Multi-agent constrained policy optimisation
S Gu, JG Kuba, M Wen, R Chen, Z Wang, Z Tian, J Wang, A Knoll, Y Yang
arXiv preprint arXiv:2110.02793, 2021
572021
Large sequence models for sequential decision-making: a survey
M Wen, R Lin, H Wang, Y Yang, Y Wen, L Mai, J Wang, H Zhang, ...
Frontiers of Computer Science 17 (6), 176349, 2023
332023
Openr: An open source framework for advanced reasoning with large language models
J Wang, M Fang, Z Wan, M Wen, J Zhu, A Liu, Z Gong, Y Song, L Chen, ...
arXiv preprint arXiv:2410.09671, 2024
82024
Reinforcing LLM Agents via Policy Optimization with Action Decomposition
M Wen, Z Wan, J Wang, W Zhang, Y Wen
The Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024
7*2024
Hammer: Robust function-calling for on-device language models via function masking
Q Lin, M Wen, Q Peng, G Nie, J Liao, J Wang, X Mo, J Zhou, C Cheng, ...
arXiv preprint arXiv:2410.04587, 2024
62024
TRAD: Enhancing LLM Agents with Step-Wise Thought Retrieval and Aligned Decision
R Zhou, Y Yang, M Wen, Y Wen, W Wang, C Xi, G Xu, Y Yu, W Zhang
Proceedings of the 47th International ACM SIGIR Conference on Research and …, 2024
42024
Safe Multiagent Learning With Soft Constrained Policy Optimization in Real Robot Control
S Gu, D Huang, M Wen, G Chen, A Knoll
IEEE Transactions on Industrial Informatics, 2024
42024
Entropy-Regularized Token-Level Policy Optimization for Large Language Models
M Wen, C Deng, J Wang, W Zhang, Y Wen
arXiv preprint arXiv:2402.06700, 2024
42024
RoMAT: Role-based multi-agent transformer for generalizable heterogeneous cooperation
D Wang, F Zhong, M Wen, M Li, Y Peng, T Li, Y Yang
Neural Networks, 106129, 2024
42024
Autonomous goal detection and cessation in reinforcement learning: A case study on source term estimation
Y Shi, M Wen, Q Zhang, W Zhang, C Liu, W Liu
arXiv preprint arXiv:2409.09541, 2024
32024
Hammerbench: Fine-grained function-calling evaluation in real mobile device scenarios
J Wang, J Zhou, M Wen, X Mo, H Zhang, Q Lin, C Jin, X Wang, W Zhang, ...
arXiv preprint arXiv:2412.16516, 2024
12024
P3: A Policy-Driven, Pace-Adaptive, and Diversity-Promoted Framework for Optimizing LLM Training
Y Yang, H Wang, M Wen, W Zhang
arXiv e-prints, arXiv: 2408.05541, 2024
2024
RDHNet: Addressing Rotational and Permutational Symmetries in Continuous Multi-Agent Systems
D Wang, L Huang, M Wen, X Teng, T LI, M Li
Open-Ended Learning in General-Sum Games: The Role of Diversity in Correlated Equilibrium
Z Zhao, M Wen, Y Wen, Y Yang
系统目前无法执行此操作,请稍后再试。
文章 1–20