A Survey on Large Language Models with some Insights on their Capabilities and Limitations

A Matarazzo, R Torlone - arxiv preprint arxiv:2501.04040, 2025 - arxiv.org
The rapid advancement of artificial intelligence, particularly with the development of Large
Language Models (LLMs) built on the transformer architecture, has redefined the …

Improving Video Generation with Human Feedback

J Liu, G Liu, J Liang, Z Yuan, X Liu, M Zheng… - arxiv preprint arxiv …, 2025 - arxiv.org
Video generation has achieved significant advances through rectified flow techniques, but
issues like unsmooth motion and misalignment between videos and prompts persist. In this …

MedS: Towards Medical Small Language Models with Self-Evolved Slow Thinking

S Jiang, Y Liao, Z Chen, Y Zhang, Y Wang… - arxiv preprint arxiv …, 2025 - arxiv.org
Medical language models (MLMs) have become pivotal in advancing medical natural
language processing. However, prior models that rely on pre-training or supervised fine …

Think Smarter not Harder: Adaptive Reasoning with Inference Aware Optimization

Z Yu, T Xu, D **, KA Sankararaman, Y He… - arxiv preprint arxiv …, 2025 - arxiv.org
Solving mathematics problems has been an intriguing capability of large language models,
and many efforts have been made to improve reasoning by extending reasoning length …

JustLogic: A Comprehensive Benchmark for Evaluating Deductive Reasoning in Large Language Models

MK Chen, X Zhang, D Tao - arxiv preprint arxiv:2501.14851, 2025 - arxiv.org
Logical reasoning is a critical component of Large Language Models (LLMs), and
substantial research efforts in recent years have aimed to enhance their deductive …

Chain-of-Retrieval Augmented Generation

L Wang, H Chen, N Yang, X Huang, Z Dou… - arxiv preprint arxiv …, 2025 - arxiv.org
This paper introduces an approach for training o1-like RAG models that retrieve and reason
over relevant information step by step before generating the final answer. Conventional RAG …

Error Classification of Large Language Models on Math Word Problems: A Dynamically Adaptive Framework

Y Sun, Z Yin, X Huang, X Qiu, H Zhao - arxiv preprint arxiv:2501.15581, 2025 - arxiv.org
Large Language Models (LLMs) have demonstrated remarkable capabilities across various
domains. Math Word Problems (MWPs) serve as a crucial benchmark for evaluating LLMs' …

MedXpertQA: Benchmarking Expert-Level Medical Reasoning and Understanding

Y Zuo, S Qu, Y Li, Z Chen, X Zhu, E Hua… - arxiv preprint arxiv …, 2025 - arxiv.org
We introduce MedXpertQA, a highly challenging and comprehensive benchmark to evaluate
expert-level medical knowledge and advanced reasoning. MedXpertQA includes 4,460 …

Kimi k1. 5: Scaling Reinforcement Learning with LLMs

K Team, A Du, B Gao, B **ng, C Jiang, C Chen… - arxiv preprint arxiv …, 2025 - arxiv.org
Language model pretraining with next token prediction has proved effective for scaling
compute but is limited to the amount of available training data. Scaling reinforcement …

Revisiting Rogers' Paradox in the Context of Human-AI Interaction

KM Collins, U Bhatt, I Sucholutsky - arxiv preprint arxiv:2501.10476, 2025 - arxiv.org
Humans learn about the world, and how to act in the world, in many ways: from individually
conducting experiments to observing and reproducing others' behavior. Different learning …