From generation to judgment: Opportunities and challenges of llm-as-a-judge
Assessment and evaluation have long been critical challenges in artificial intelligence (AI)
and natural language processing (NLP). However, traditional methods, whether matching …
and natural language processing (NLP). However, traditional methods, whether matching …
Multi-modal and multi-agent systems meet rationality: A survey
Rationality is characterized by logical thinking and decision-making that align with evidence
and logical rules. This quality is essential for effective problem-solving, as it ensures that …
and logical rules. This quality is essential for effective problem-solving, as it ensures that …
DebUnc: mitigating hallucinations in large language model agent communication with uncertainty estimations
To enhance Large Language Model (LLM) capabilities, multi-agent debates have been
introduced, where multiple LLMs discuss solutions to a problem over several rounds of …
introduced, where multiple LLMs discuss solutions to a problem over several rounds of …
A Survey on Uncertainty Quantification of Large Language Models: Taxonomy, Open Research Challenges, and Future Directions
The remarkable performance of large language models (LLMs) in content generation,
coding, and common-sense reasoning has spurred widespread integration into many facets …
coding, and common-sense reasoning has spurred widespread integration into many facets …
Do llms know when to not answer? investigating abstention abilities of large language models
Abstention Ability (AA) is a critical aspect of Large Language Model (LLM) reliability,
referring to an LLM's capability to withhold responses when uncertain or lacking a definitive …
referring to an LLM's capability to withhold responses when uncertain or lacking a definitive …
Deliberate reasoning for llms as structure-aware planning with accurate world model
Enhancing the reasoning capabilities of large language models (LLMs) remains a key
challenge, especially for tasks that require complex, multi-step decision-making. Humans …
challenge, especially for tasks that require complex, multi-step decision-making. Humans …
Understanding the relationship between prompts and response uncertainty in large language models
Large language models (LLMs) are widely used in decision-making, but their reliability,
especially in critical tasks like healthcare, is not well-established. Therefore, understanding …
especially in critical tasks like healthcare, is not well-established. Therefore, understanding …
Cross-Refine: Improving Natural Language Explanation Generation by Learning in Tandem
Natural language explanations (NLEs) are vital for elucidating the reasoning behind large
language model (LLM) decisions. Many techniques have been developed to generate NLEs …
language model (LLM) decisions. Many techniques have been developed to generate NLEs …
FactTest: Factuality Testing in Large Language Models with Statistical Guarantees
The propensity of Large Language Models (LLMs) to generate hallucinations and non-
factual content undermines their reliability in high-stakes domains, where rigorous control …
factual content undermines their reliability in high-stakes domains, where rigorous control …
MACAROON: Training Vision-Language Models To Be Your Engaged Partners
Large vision-language models (LVLMs), while proficient in following instructions and
responding to diverse questions, invariably generate detailed responses even when …
responding to diverse questions, invariably generate detailed responses even when …