Building math agents with multi-turn iterative preference learning
Recent studies have shown that large language models'(LLMs) mathematical problem-
solving capabilities can be enhanced by integrating external tools, such as code …
solving capabilities can be enhanced by integrating external tools, such as code …
Improve vision language model chain-of-thought reasoning
Chain-of-thought (CoT) reasoning in vision language models (VLMs) is crucial for improving
interpretability and trustworthiness. However, current training recipes lack robust CoT …
interpretability and trustworthiness. However, current training recipes lack robust CoT …
Improving Multi-Step Reasoning Abilities of Large Language Models with Direct Advantage Policy Optimization
The role of reinforcement learning (RL) in enhancing the reasoning of large language
models (LLMs) is becoming increasingly significant. Despite the success of RL in many …
models (LLMs) is becoming increasingly significant. Despite the success of RL in many …
Can We Generate Images with CoT? Let's Verify and Reinforce Image Generation Step by Step
Chain-of-Thought (CoT) reasoning has been extensively explored in large models to tackle
complex understanding tasks. However, it still remains an open question whether such …
complex understanding tasks. However, it still remains an open question whether such …
Autoregressive Multi-trait Essay Scoring via Reinforcement Learning with Scoring-aware Multiple Rewards
Recent advances in automated essay scoring (AES) have shifted towards evaluating
multiple traits to provide enriched feedback. Like typical AES systems, multi-trait AES …
multiple traits to provide enriched feedback. Like typical AES systems, multi-trait AES …
Mars-PO: Multi-Agent Reasoning System Preference Optimization
Mathematical reasoning is a fundamental capability for large language models (LLMs), yet
achieving high performance in this domain remains a significant challenge. The auto …
achieving high performance in this domain remains a significant challenge. The auto …