A Survey on Large Language Models with some Insights on their Capabilities and Limitations
A Matarazzo, R Torlone - arxiv preprint arxiv:2501.04040, 2025 - arxiv.org
The rapid advancement of artificial intelligence, particularly with the development of Large
Language Models (LLMs) built on the transformer architecture, has redefined the …
Language Models (LLMs) built on the transformer architecture, has redefined the …
Improving Video Generation with Human Feedback
Video generation has achieved significant advances through rectified flow techniques, but
issues like unsmooth motion and misalignment between videos and prompts persist. In this …
issues like unsmooth motion and misalignment between videos and prompts persist. In this …
MedS: Towards Medical Small Language Models with Self-Evolved Slow Thinking
Medical language models (MLMs) have become pivotal in advancing medical natural
language processing. However, prior models that rely on pre-training or supervised fine …
language processing. However, prior models that rely on pre-training or supervised fine …
Think Smarter not Harder: Adaptive Reasoning with Inference Aware Optimization
Solving mathematics problems has been an intriguing capability of large language models,
and many efforts have been made to improve reasoning by extending reasoning length …
and many efforts have been made to improve reasoning by extending reasoning length …
JustLogic: A Comprehensive Benchmark for Evaluating Deductive Reasoning in Large Language Models
Logical reasoning is a critical component of Large Language Models (LLMs), and
substantial research efforts in recent years have aimed to enhance their deductive …
substantial research efforts in recent years have aimed to enhance their deductive …
Chain-of-Retrieval Augmented Generation
This paper introduces an approach for training o1-like RAG models that retrieve and reason
over relevant information step by step before generating the final answer. Conventional RAG …
over relevant information step by step before generating the final answer. Conventional RAG …
Error Classification of Large Language Models on Math Word Problems: A Dynamically Adaptive Framework
Large Language Models (LLMs) have demonstrated remarkable capabilities across various
domains. Math Word Problems (MWPs) serve as a crucial benchmark for evaluating LLMs' …
domains. Math Word Problems (MWPs) serve as a crucial benchmark for evaluating LLMs' …
MedXpertQA: Benchmarking Expert-Level Medical Reasoning and Understanding
We introduce MedXpertQA, a highly challenging and comprehensive benchmark to evaluate
expert-level medical knowledge and advanced reasoning. MedXpertQA includes 4,460 …
expert-level medical knowledge and advanced reasoning. MedXpertQA includes 4,460 …
Kimi k1. 5: Scaling Reinforcement Learning with LLMs
K Team, A Du, B Gao, B **ng, C Jiang, C Chen… - arxiv preprint arxiv …, 2025 - arxiv.org
Language model pretraining with next token prediction has proved effective for scaling
compute but is limited to the amount of available training data. Scaling reinforcement …
compute but is limited to the amount of available training data. Scaling reinforcement …
Revisiting Rogers' Paradox in the Context of Human-AI Interaction
Humans learn about the world, and how to act in the world, in many ways: from individually
conducting experiments to observing and reproducing others' behavior. Different learning …
conducting experiments to observing and reproducing others' behavior. Different learning …