From generation to judgment: Opportunities and challenges of llm-as-a-judge

D Li, B Jiang, L Huang, A Beigi, C Zhao, Z Tan… - arxiv preprint arxiv …, 2024 - arxiv.org
Assessment and evaluation have long been critical challenges in artificial intelligence (AI)
and natural language processing (NLP). However, traditional methods, whether matching …

Multi-modal and multi-agent systems meet rationality: A survey

B Jiang, Y **e, X Wang, WJ Su, CJ Taylor… - ICML 2024 Workshop …, 2024 - openreview.net
Rationality is characterized by logical thinking and decision-making that align with evidence
and logical rules. This quality is essential for effective problem-solving, as it ensures that …

DebUnc: mitigating hallucinations in large language model agent communication with uncertainty estimations

L Yoffe, A Amayuelas, WY Wang - arxiv preprint arxiv:2407.06426, 2024 - arxiv.org
To enhance Large Language Model (LLM) capabilities, multi-agent debates have been
introduced, where multiple LLMs discuss solutions to a problem over several rounds of …

A Survey on Uncertainty Quantification of Large Language Models: Taxonomy, Open Research Challenges, and Future Directions

O Shorinwa, Z Mei, J Lidard, AZ Ren… - arxiv preprint arxiv …, 2024 - arxiv.org
The remarkable performance of large language models (LLMs) in content generation,
coding, and common-sense reasoning has spurred widespread integration into many facets …

Do llms know when to not answer? investigating abstention abilities of large language models

N Madhusudhan, ST Madhusudhan, V Yadav… - arxiv preprint arxiv …, 2024 - arxiv.org
Abstention Ability (AA) is a critical aspect of Large Language Model (LLM) reliability,
referring to an LLM's capability to withhold responses when uncertain or lacking a definitive …

Deliberate reasoning for llms as structure-aware planning with accurate world model

S **ong, A Payani, Y Yang, F Fekri - arxiv preprint arxiv:2410.03136, 2024 - arxiv.org
Enhancing the reasoning capabilities of large language models (LLMs) remains a key
challenge, especially for tasks that require complex, multi-step decision-making. Humans …

Understanding the relationship between prompts and response uncertainty in large language models

ZY Zhang, A Verma, F Doshi-Velez… - arxiv preprint arxiv …, 2024 - arxiv.org
Large language models (LLMs) are widely used in decision-making, but their reliability,
especially in critical tasks like healthcare, is not well-established. Therefore, understanding …

Cross-Refine: Improving Natural Language Explanation Generation by Learning in Tandem

Q Wang, T Anikina, N Feldhus, S Ostermann… - arxiv preprint arxiv …, 2024 - arxiv.org
Natural language explanations (NLEs) are vital for elucidating the reasoning behind large
language model (LLM) decisions. Many techniques have been developed to generate NLEs …

FactTest: Factuality Testing in Large Language Models with Statistical Guarantees

F Nie, X Hou, S Lin, J Zou, H Yao, L Zhang - arxiv preprint arxiv …, 2024 - arxiv.org
The propensity of Large Language Models (LLMs) to generate hallucinations and non-
factual content undermines their reliability in high-stakes domains, where rigorous control …

MACAROON: Training Vision-Language Models To Be Your Engaged Partners

S Wu, YR Fung, S Li, Y Wan, KW Chang… - arxiv preprint arxiv …, 2024 - arxiv.org
Large vision-language models (LVLMs), while proficient in following instructions and
responding to diverse questions, invariably generate detailed responses even when …