A survey of language model confidence estimation and calibration
Language models (LMs) have demonstrated remarkable capabilities across a wide range of
tasks in various domains. Despite their impressive performance, the reliability of their output …
tasks in various domains. Despite their impressive performance, the reliability of their output …
A Survey of Confidence Estimation and Calibration in Large Language Models
Large language models (LLMs) have demonstrated remarkable capabilities across a wide
range of tasks in various domains. Despite their impressive performance, they can be …
range of tasks in various domains. Despite their impressive performance, they can be …
Adaptation with self-evaluation to improve selective prediction in llms
Large language models (LLMs) have recently shown great advances in a variety of tasks,
including natural language understanding and generation. However, their use in high …
including natural language understanding and generation. However, their use in high …
Mitigating temporal misalignment by discarding outdated facts
While large language models are able to retain vast amounts of world knowledge seen
during pretraining, such knowledge is prone to going out of date and is nontrivial to update …
during pretraining, such knowledge is prone to going out of date and is nontrivial to update …
Do llms know when to not answer? investigating abstention abilities of large language models
Abstention Ability (AA) is a critical aspect of Large Language Model (LLM) reliability,
referring to an LLM's capability to withhold responses when uncertain or lacking a definitive …
referring to an LLM's capability to withhold responses when uncertain or lacking a definitive …
Can NLP Models' Identify','Distinguish', and'Justify'Questions that Don't have a Definitive Answer?
Though state-of-the-art (SOTA) NLP systems have achieved remarkable performance on a
variety of language understanding tasks, they primarily focus on questions that have a …
variety of language understanding tasks, they primarily focus on questions that have a …
Crowd-Calibrator: Can Annotator Disagreement Inform Calibration in Subjective Tasks?
Subjective tasks in NLP have been mostly relegated to objective standards, where the gold
label is decided by taking the majority vote. This obfuscates annotator disagreement and the …
label is decided by taking the majority vote. This obfuscates annotator disagreement and the …
Accelerating llm inference by enabling intermediate layer decoding
Large Language Models (LLMs) have achieved remarkable performance across a wide
variety of natural language tasks; however, their large size makes their inference slow and …
variety of natural language tasks; however, their large size makes their inference slow and …
Ambiguity meets uncertainty: Investigating uncertainty estimation for word sense disambiguation
Z Liu, Y Liu - arxiv preprint arxiv:2305.13119, 2023 - arxiv.org
Word sense disambiguation (WSD), which aims to determine an appropriate sense for a
target word given its context, is crucial for natural language understanding. Existing …
target word given its context, is crucial for natural language understanding. Existing …
LLMs' Reading Comprehension Is Affected by Parametric Knowledge and Struggles with Hypothetical Statements
The task of reading comprehension (RC), often implemented as context-based question
answering (QA), provides a primary means to assess language models' natural language …
answering (QA), provides a primary means to assess language models' natural language …