Uncertainty quantification in machine learning for engineering design and health prognostics: A tutorial

V Nemani, L Biggio, X Huan, Z Hu, O Fink… - … Systems and Signal …, 2023 - Elsevier
On top of machine learning (ML) models, uncertainty quantification (UQ) functions as an
essential layer of safety assurance that could lead to more principled decision making by …

Understanding metric-related pitfalls in image analysis validation

A Reinke, MD Tizabi, M Baumgartner, M Eisenmann… - Nature …, 2024 - nature.com
Validation metrics are key for tracking scientific progress and bridging the current chasm
between artificial intelligence research and its translation into practice. However, increasing …

Scaling vision transformers to 22 billion parameters

M Dehghani, J Djolonga, B Mustafa… - International …, 2023 - proceedings.mlr.press
The scaling of Transformers has driven breakthrough capabilities for language models. At
present, the largest large language models (LLMs) contain upwards of 100B parameters …

Holistic evaluation of language models

P Liang, R Bommasani, T Lee, D Tsipras… - arxiv preprint arxiv …, 2022 - arxiv.org
Language models (LMs) are becoming the foundation for almost all major language
technologies, but their capabilities, limitations, and risks are not well understood. We present …

Can llms express their uncertainty? an empirical evaluation of confidence elicitation in llms

M **ong, Z Hu, X Lu, Y Li, J Fu, J He, B Hooi - arxiv preprint arxiv …, 2023 - arxiv.org
Empowering large language models to accurately express confidence in their answers is
essential for trustworthy decision-making. Previous confidence elicitation methods, which …

Beyond the imitation game: Quantifying and extrapolating the capabilities of language models

A Srivastava, A Rastogi, A Rao, AAM Shoeb… - arxiv preprint arxiv …, 2022 - arxiv.org
Language models demonstrate both quantitative improvement and new qualitative
capabilities with increasing scale. Despite their potentially transformative impact, these new …

Prompting gpt-3 to be reliable

C Si, Z Gan, Z Yang, S Wang, J Wang… - arxiv preprint arxiv …, 2022 - arxiv.org
Large language models (LLMs) show impressive abilities via few-shot prompting.
Commercialized APIs such as OpenAI GPT-3 further increase their use in real-world …

Mitigating neural network overconfidence with logit normalization

H Wei, R **e, H Cheng, L Feng… - … conference on machine …, 2022 - proceedings.mlr.press
Detecting out-of-distribution inputs is critical for the safe deployment of machine learning
models in the real world. However, neural networks are known to suffer from the …

Contrastive test-time adaptation

D Chen, D Wang, T Darrell… - Proceedings of the …, 2022 - openaccess.thecvf.com
Test-time adaptation is a special setting of unsupervised domain adaptation where a trained
model on the source domain has to adapt to the target domain without accessing source …

The'Problem'of Human Label Variation: On Ground Truth in Data, Modeling and Evaluation

B Plank - arxiv preprint arxiv:2211.02570, 2022 - arxiv.org
Human variation in labeling is often considered noise. Annotation projects for machine
learning (ML) aim at minimizing human label variation, with the assumption to maximize …