Spremljaj
Runji Lin
Naslov
Navedeno
Navedeno
Leto
Qwen technical report
J Bai, S Bai, Y Chu, Z Cui, K Dang, X Deng, Y Fan, W Ge, Y Han, F Huang, ...
arXiv preprint arXiv:2309.16609, 2023
26282023
Qwen2. 5 technical report
A Yang, B Yang, B Zhang, B Hui, B Zheng, B Yu, C Li, D Liu, F Huang, ...
arXiv preprint arXiv:2412.15115, 2024
11422024
Multi-Agent Reinforcement Learning is a Sequence Modeling Problem
M Wen, JG Kuba, R Lin, W Zhang, Y Wen, J Wang, Y Yang
NeurIPS 2022, 2022
2182022
# instag: Instruction tagging for analyzing supervised fine-tuning of large language models
K Lu, H Yuan, Z Yuan, R Lin, J Lin, C Tan, C Zhou, J Zhou
arXiv preprint arXiv:2308.07074, 2023
942023
Qwen2. 5-math technical report: Toward mathematical expert model via self-improvement
A Yang, B Zhang, B Hui, B Gao, B Yu, C Li, D Liu, J Tu, J Zhou, J Lin, K Lu, ...
arXiv preprint arXiv:2409.12122, 2024
87*2024
Routing to the expert: Efficient reward-guided ensemble of large language models
K Lu, H Yuan, R Lin, J Lin, Z Yuan, C Zhou, J Zhou
NAACL, 2023
662023
Qwq: Reflect deeply on the boundaries of the unknown
Q Team
Hugging Face, 2024
45*2024
Large language models play starcraft ii: Benchmarks and a chain of summarization approach
W Ma, Q Mi, X Yan, Y Wu, R Lin, H Zhang, J Wang
NeurIPS 2024, 2023
442023
Large Sequence Models for Sequential Decision-Making: A Survey
M WEN, R LIN, H WANG, Y YANG, Y WEN, L MAI, J WANG, H ZHANG, ...
Frontiers of Computer Science, 2023
382023
Processbench: Identifying process errors in mathematical reasoning
C Zheng, Z Zhang, B Zhang, R Lin, K Lu, B Yu, D Liu, J Zhou, J Lin
arXiv preprint arXiv:2412.06559, 2024
222024
Online merging optimizers for boosting rewards and mitigating tax in alignment
K Lu, B Yu, F Huang, Y Fan, R Lin, C Zhou
arXiv preprint arXiv:2405.17931, 2024
202024
The lessons of developing process reward models in mathematical reasoning
Z Zhang, C Zheng, Y Wu, B Zhang, R Lin, B Yu, D Liu, J Zhou, J Lin
arXiv preprint arXiv:2501.07301, 2025
172025
Llm critics help catch bugs in mathematics: Towards a better mathematical verifier with natural language feedback
B Gao, Z Cai, R Xu, P Wang, C Zheng, R Lin, K Lu, J Lin, C Zhou, W Xiao, ...
arXiv preprint arXiv:2406.14024, 2024
12*2024
Learn to flap: Foil non-parametric path planning via deep reinforcement learning
ZP Wang, RJ Lin, ZY Zhao, X Chen, PM Guo, N Yang, ZC Wang, DX Fan
Journal of Fluid Mechanics 984, A9, 2024
112024
Scalable Model-based Policy Optimization for Decentralized Networked Systems
Y Du, C Ma, Y Liu, R Lin, H Dong, J Wang, Y Yang
IROS 2022, 2022
11*2022
Contextual Transformer for Offline Meta Reinforcement Learning
R Lin, Y Li, X Feng, Z Zhang, XHW Fung, H Zhang, J Wang, Y Du, Y Yang
NeurIPS 2022 Workshop: Foundation Models for Decision Making, 2022
102022
Increasing the Data Rate for Reflected Optical Camera Communication Using Uniform LED Light
Z Chen, R Lin, H Duan, Y Chen, Y Yang, R Wu, L Chen
IEEE INFOCOM 2020-IEEE Conference on Computer Communications Workshops …, 2020
12020
Online Decision MetaMorphFormer: A Casual Transformer-Based Reinforcement Learning Framework of Universal Embodied Intelligence
L Ji, R Lin
arXiv preprint arXiv:2409.07341, 2024
2024
Learning Robust Communication by Adversarial Training in Networked System Control
R Lin, H Zhang
Chinese Conference on Swarm Intelligence and Cooperative Control, 605-619, 2023
2023
Sistem trenutno ne more izvesti postopka. Poskusite znova pozneje.
Članki 1–19