Conceptmath: A bilingual concept-wise benchmark for measuring mathematical reasoning of large language models

Y Wu, J Liu, X Bu, J Liu, Z Zhou, Y Zhang… - arxiv preprint arxiv …, 2024 - arxiv.org
This paper introduces ConceptMath, a bilingual (English and Chinese), fine-grained
benchmark that evaluates concept-wise mathematical reasoning of Large Language Models …

Cfbench: A comprehensive constraints-following benchmark for llms

T Zhang, Y Shen, W Luo, Y Zhang, H Liang… - arxiv preprint arxiv …, 2024 - arxiv.org
The adeptness of Large Language Models (LLMs) in comprehending and following natural
language instructions is critical for their deployment in sophisticated real-world applications …

Survey of cultural awareness in language models: Text and beyond

S Pawar, J Park, J **, A Arora, J Myung… - arxiv preprint arxiv …, 2024 - arxiv.org
Large-scale deployment of large language models (LLMs) in various applications, such as
chatbots and virtual assistants, requires LLMs to be culturally sensitive to the user to ensure …

WenMind: A Comprehensive Benchmark for Evaluating Large Language Models in Chinese Classical Literature and Language Arts

J Cao, Y Liu, Y Shi, K Ding… - Advances in Neural …, 2025 - proceedings.neurips.cc
Abstract Large Language Models (LLMs) have made significant advancements across
numerous domains, but their capabilities in Chinese Classical Literature and Language Arts …

CS4: Measuring the Creativity of Large Language Models Automatically by Controlling the Number of Story-Writing Constraints

A Atmakuru, J Nainani, RSR Bheemreddy… - arxiv preprint arxiv …, 2024 - arxiv.org
Evaluating the creativity of large language models (LLMs) in story writing is difficult because
LLM-generated stories could seemingly look creative but be very similar to some existing …

IOPO: Empowering LLMs with Complex Instruction Following via Input-Output Preference Optimization

X Zhang, H Yu, C Fu, F Huang, Y Li - arxiv preprint arxiv:2411.06208, 2024 - arxiv.org
In the realm of large language models (LLMs), the ability of models to accurately follow
instructions is paramount as more agents and applications leverage LLMs for construction …

Latent Learningscape Guided In-context Learning

A Zhou, S Jiang, Y Liu, Y Wu, K Kuang… - Findings of the …, 2024 - aclanthology.org
The growing interest in leveraging large language models is driven by their exceptional
imitation and reasoning capabilities. In-context learning (ICL), a streamlined method, has …