VALTEST: Automated Validation of Language Model Generated Test Cases

H Taherkhani, H Hemmati - arxiv preprint arxiv:2411.08254, 2024 - arxiv.org
Large Language Models (LLMs) have demonstrated significant potential in automating
software testing, specifically in generating unit test cases. However, the validation of LLM …

Representation Engineering for Large-Language Models: Survey and Research Challenges

L Bartoszcze, S Munshi, B Sukidi, J Yen, Z Yang… - arxiv preprint arxiv …, 2025 - arxiv.org
Large-language models are capable of completing a variety of tasks, but remain
unpredictable and intractable. Representation engineering seeks to resolve this problem …

Think More, Hallucinate Less: Mitigating Hallucinations via Dual Process of Fast and Slow Thinking

X Cheng, J Li, WX Zhao, JR Wen - arxiv preprint arxiv:2501.01306, 2025 - arxiv.org
Large language models (LLMs) demonstrate exceptional capabilities, yet still face the
hallucination issue. Typical text generation approaches adopt an auto-regressive generation …

HalluCana: Fixing LLM Hallucination with A Canary Lookahead

T Li, E Dayanik, S Tyagi, A Pierleoni - arxiv preprint arxiv:2412.07965, 2024 - arxiv.org
In this paper, we present HalluCana, a canary lookahead to detect and correct factuality
hallucinations of Large Language Models (LLMs) in long-form generation. HalluCana …

VideoICL: Confidence-based Iterative In-context Learning for Out-of-Distribution Video Understanding

K Kim, G Park, Y Lee, W Yeo, SJ Hwang - arxiv preprint arxiv:2412.02186, 2024 - arxiv.org
Recent advancements in video large multimodal models (LMMs) have significantly improved
their video understanding and reasoning capabilities. However, their performance drops on …

CoCo-CoLa: Evaluating Language Adherence in Multilingual LLMs

E Rahmati, AS Ziabari, M Dehghani - arxiv preprint arxiv:2502.12476, 2025 - arxiv.org
Multilingual Large Language Models (LLMs) develop cross-lingual abilities despite being
trained on limited parallel data. However, they often struggle to generate responses in the …

Attention-guided Self-reflection for Zero-shot Hallucination Detection in Large Language Models

Q Liu, X Chen, Y Ding, S Xu, S Wu, L Wang - arxiv preprint arxiv …, 2025 - arxiv.org
Hallucination has emerged as a significant barrier to the effective application of Large
Language Models (LLMs). In this work, we introduce a novel Attention-Guided SElf …

CoT-UQ: Improving Response-wise Uncertainty Quantification in LLMs with Chain-of-Thought

B Zhang, R Zhang - arxiv preprint arxiv:2502.17214, 2025 - arxiv.org
Large language models (LLMs) excel in many tasks but struggle to accurately quantify
uncertainty in their generated responses. This limitation makes it challenging to detect …

Sample, Scrutinize and Scale: Effective Inference-Time Search by Scaling Verification

E Zhao, P Awasthi, S Gollapudi - arxiv preprint arxiv:2502.01839, 2025 - arxiv.org
Sampling-based search, a simple paradigm for utilizing test-time compute, involves
generating multiple candidate responses and selecting the best one--typically by verifying …

Collaborative Instance Navigation: Leveraging Agent Self-Dialogue to Minimize User Input

F Taioli, E Zorzi, G Franchi, A Castellini… - arxiv preprint arxiv …, 2024 - arxiv.org
Existing embodied instance goal navigation tasks, driven by natural language, assume
human users to provide complete and nuanced instance descriptions prior to the navigation …