A survey of confidence estimation and calibration in large language models

J Geng, F Cai, Y Wang, H Koeppl, P Nakov… - arxiv preprint arxiv …, 2023 - arxiv.org
Large language models (LLMs) have demonstrated remarkable capabilities across a wide
range of tasks in various domains. Despite their impressive performance, they can be …

Harnessing the power of llms in practice: A survey on chatgpt and beyond

J Yang, H **, R Tang, X Han, Q Feng, H Jiang… - ACM Transactions on …, 2024 - dl.acm.org
This article presents a comprehensive and practical guide for practitioners and end-users
working with Large Language Models (LLMs) in their downstream Natural Language …

Trustworthy llms: a survey and guideline for evaluating large language models' alignment

Y Liu, Y Yao, JF Ton, X Zhang, R Guo, H Cheng… - arxiv preprint arxiv …, 2023 - arxiv.org
Ensuring alignment, which refers to making models behave in accordance with human
intentions [1, 2], has become a critical task before deploying large language models (LLMs) …

Active prompting with chain-of-thought for large language models

S Diao, P Wang, Y Lin, R Pan, X Liu… - arxiv preprint arxiv …, 2023 - arxiv.org
The increasing scale of large language models (LLMs) brings emergent abilities to various
complex tasks requiring reasoning, such as arithmetic and commonsense reasoning. It is …

The'Problem'of Human Label Variation: On Ground Truth in Data, Modeling and Evaluation

B Plank - arxiv preprint arxiv:2211.02570, 2022 - arxiv.org
Human variation in labeling is often considered noise. Annotation projects for machine
learning (ML) aim at minimizing human label variation, with the assumption to maximize …

Teaching models to express their uncertainty in words

S Lin, J Hilton, O Evans - arxiv preprint arxiv:2205.14334, 2022 - arxiv.org
We show that a GPT-3 model can learn to express uncertainty about its own answers in
natural language--without use of model logits. When given a question, the model generates …

A survey of mix-based data augmentation: Taxonomy, methods, applications, and explainability

C Cao, F Zhou, Y Dai, J Wang, K Zhang - ACM Computing Surveys, 2024 - dl.acm.org
Data augmentation (DA) is indispensable in modern machine learning and deep neural
networks. The basic idea of DA is to construct new training data to improve the model's …

Knowledgeable prompt-tuning: Incorporating knowledge into prompt verbalizer for text classification

S Hu, N Ding, H Wang, Z Liu, J Wang, J Li… - arxiv preprint arxiv …, 2021 - arxiv.org
Tuning pre-trained language models (PLMs) with task-specific prompts has been a
promising approach for text classification. Particularly, previous studies suggest that prompt …

Navigating the grey area: How expressions of uncertainty and overconfidence affect language models

K Zhou, D Jurafsky, T Hashimoto - arxiv preprint arxiv:2302.13439, 2023 - arxiv.org
The increased deployment of LMs for real-world tasks involving knowledge and facts makes
it important to understand model epistemology: what LMs think they know, and how their …

How Can We Know When Language Models Know? On the Calibration of Language Models for Question Answering

Z Jiang, J Araki, H Ding, G Neubig - Transactions of the Association …, 2021 - direct.mit.edu
Recent works have shown that language models (LM) capture different types of knowledge
regarding facts or common sense. However, because no model is perfect, they still fail to …