Assessing the Impact of Conspiracy Theories Using Large Language Models

B Jiang, D Li, Z Tan, X Zhou, A Rao, K Lerman… - arxiv preprint arxiv …, 2024 - arxiv.org
Measuring the relative impact of CTs is important for prioritizing responses and allocating
resources effectively, especially during crises. However, assessing the actual impact of CTs …

Self-Supervised Prompt Optimization

J **ang, J Zhang, Z Yu, F Teng, J Tu, X Liang… - arxiv preprint arxiv …, 2025 - arxiv.org
Well-designed prompts are crucial for enhancing Large language models'(LLMs) reasoning
capabilities while aligning their outputs with task requirements across diverse domains …

MMRC: A Large-Scale Benchmark for Understanding Multimodal Large Language Model in Real-World Conversation

H Xue, F Tang, M Hu, Y Liu, Q Huang, Y Li… - arxiv preprint arxiv …, 2025 - arxiv.org
Recent multimodal large language models (MLLMs) have demonstrated significant potential
in open-ended conversation, generating more accurate and personalized responses …

Crowd Comparative Reasoning: Unlocking Comprehensive Evaluations for LLM-as-a-Judge

Q Zhang, Y Wang, Y Jiang, L Li, C Wu, Y Wang… - arxiv preprint arxiv …, 2025 - arxiv.org
LLM-as-a-Judge, which generates chain-of-thought (CoT) judgments, has become a widely
adopted auto-evaluation method. However, its reliability is compromised by the CoT …

Outcome-Refining Process Supervision for Code Generation

Z Yu, W Gu, Y Wang, Z Zeng, J Wang, W Ye… - arxiv preprint arxiv …, 2024 - arxiv.org
Large Language Models have demonstrated remarkable capabilities in code generation, yet
they often struggle with complex programming tasks that require deep algorithmic …

MCQA-Eval: Efficient Confidence Evaluation in NLG with Gold-Standard Correctness Labels

X Liu, Z Lin, L Da, C Chen, S Trivedi, H Wei - arxiv preprint arxiv …, 2025 - arxiv.org
Large Language Models (LLMs) require robust confidence estimation, particularly in critical
domains like healthcare and law where unreliable outputs can lead to significant …

SCALE: Towards Collaborative Content Analysis in Social Science with Large Language Model Agents and Human Intervention

C Zhao, Z Tan, CW Wong, X Zhao, T Chen… - arxiv preprint arxiv …, 2025 - arxiv.org
Content analysis breaks down complex and unstructured texts into theory-informed
numerical categories. Particularly, in social science, this process usually relies on multiple …

Position: LLMs Can be Good Tutors in Foreign Language Education

J Ye, S Wang, D Zou, Y Yan, K Wang, HT Zheng… - arxiv preprint arxiv …, 2025 - arxiv.org
While recent efforts have begun integrating large language models (LLMs) into foreign
language education (FLE), they often rely on traditional approaches to learning tasks without …

TiEBe: A Benchmark for Assessing the Current Knowledge of Large Language Models

TS Almeida, GK Bonás, JGA Santos, H Abonizio… - arxiv preprint arxiv …, 2025 - arxiv.org
In a rapidly evolving knowledge landscape and the increasing adoption of large language
models, a need has emerged to keep these models continuously updated with current …

RefineCoder: Iterative Improving of Large Language Models via Adaptive Critique Refinement for Code Generation

C Zhou, X Zhang, D Song, X Chen, W Gu, H Ma… - arxiv preprint arxiv …, 2025 - arxiv.org
Code generation has attracted increasing attention with the rise of Large Language Models
(LLMs). Many studies have developed powerful code LLMs by synthesizing code-related …