Internal consistency and self-feedback in large language models: A survey

X Liang, S Song, Z Zheng, H Wang, Q Yu, X Li… - arxiv preprint arxiv …, 2024 - arxiv.org
Large language models (LLMs) often exhibit deficient reasoning or generate hallucinations.
To address these, studies prefixed with" Self-" such as Self-Consistency, Self-Improve, and …

Uhgeval: Benchmarking the hallucination of chinese large language models via unconstrained generation

X Liang, S Song, S Niu, Z Li, F **ong, B Tang… - arxiv preprint arxiv …, 2023 - arxiv.org
Large language models (LLMs) have emerged as pivotal contributors in contemporary
natural language processing and are increasingly being applied across a diverse range of …

Attention heads of large language models: A survey

Z Zheng, Y Wang, Y Huang, S Song, M Yang… - arxiv preprint arxiv …, 2024 - arxiv.org
Since the advent of ChatGPT, Large Language Models (LLMs) have excelled in various
tasks but remain as black-box systems. Consequently, the reasoning bottlenecks of LLMs …

Perteval: Unveiling real knowledge capacity of llms with knowledge-invariant perturbations

J Li, R Hu, K Huang, Y Zhuang, Q Liu, M Zhu… - arxiv preprint arxiv …, 2024 - arxiv.org
Expert-designed close-ended benchmarks are indispensable in assessing the knowledge
capacity of large language models (LLMs). Despite their widespread use, concerns have …

How Well Do LLMs Handle Cantonese? Benchmarking Cantonese Capabilities of Large Language Models

J Jiang, P Chen, L Chen, S Wang, Q Bao… - arxiv preprint arxiv …, 2024 - arxiv.org
The rapid evolution of large language models (LLMs) has transformed the competitive
landscape in natural language processing (NLP), particularly for English and other data-rich …

GRAIT: Gradient-Driven Refusal-Aware Instruction Tuning for Effective Hallucination Mitigation

R Zhu, Z Jiang, J Wu, Z Ma, J Song, F Bai, D Lin… - arxiv preprint arxiv …, 2025 - arxiv.org
Refusal-Aware Instruction Tuning (RAIT) aims to enhance Large Language Models (LLMs)
by improving their ability to refuse responses to questions beyond their knowledge, thereby …

GRAPHMOE: Amplifying Cognitive Depth of Mixture-of-Experts Network via Introducing Self-Rethinking Mechanism

C Tang, B Lv, Z Zheng, B Yang, K Zhao, N Liao… - arxiv preprint arxiv …, 2025 - arxiv.org
Traditional Mixture-of-Experts (MoE) networks benefit from utilizing multiple smaller expert
models as opposed to a single large network. However, these experts typically operate …

TurtleBench: Evaluating Top Language Models via Real-World Yes/No Puzzles

Q Yu, S Song, K Fang, Y Shi, Z Zheng, H Wang… - arxiv preprint arxiv …, 2024 - arxiv.org
As the application of Large Language Models (LLMs) expands, the demand for reliable
evaluations increases. Existing LLM evaluation benchmarks primarily rely on static datasets …

LLMs as Function Approximators: Terminology, Taxonomy, and Questions for Evaluation

D Schlangen - arxiv preprint arxiv:2407.13744, 2024 - arxiv.org
Natural Language Processing has moved rather quickly from modelling specific tasks to
taking more general pre-trained models and fine-tuning them for specific tasks, to a point …

Attention heads of large language models

Z Zheng, Y Wang, Y Huang, S Song, M Yang, B Tang… - Patterns - cell.com
Large language models (LLMs) have demonstrated performance approaching human levels
in tasks such as long-text comprehension and mathematical reasoning, but they remain …