Teaching models to express their uncertainty in words

S Lin, J Hilton, O Evans - arxiv preprint arxiv:2205.14334, 2022 - arxiv.org
We show that a GPT-3 model can learn to express uncertainty about its own answers in
natural language--without use of model logits. When given a question, the model generates …

Approaching human-level forecasting with language models

D Halawi, F Zhang, C Yueh-Han… - arxiv preprint arxiv …, 2024 - arxiv.org
Forecasting future events is important for policy and decision making. In this work, we study
whether language models (LMs) can forecast at the level of competitive human forecasters …

Natural selection favors AIs over humans

D Hendrycks - arxiv preprint arxiv:2303.16200, 2023 - arxiv.org
For billions of years, evolution has been the driving force behind the development of life,
including humans. Evolution endowed humans with high intelligence, which allowed us to …

Back to the future: Towards explainable temporal reasoning with large language models

C Yuan, Q **e, J Huang, S Ananiadou - Proceedings of the ACM Web …, 2024 - dl.acm.org
Temporal reasoning is a crucial natural language processing (NLP) task, providing a
nuanced understanding of time-sensitive contexts within textual data. Although recent …

Safetywashing: Do AI Safety Benchmarks Actually Measure Safety Progress?

R Ren, S Basart, A Khoja, A Gatti, L Phan, X Yin… - arxiv preprint arxiv …, 2024 - arxiv.org
As artificial intelligence systems grow more powerful, there has been increasing interest in"
AI safety" research to address emerging and future risks. However, the field of AI safety …

Temporal knowledge graph forecasting without knowledge using in-context learning

DH Lee, K Ahrabian, W **, F Morstatter… - arxiv preprint arxiv …, 2023 - arxiv.org
Temporal knowledge graph (TKG) forecasting benchmarks challenge models to predict
future facts using knowledge of past facts. In this paper, we apply large language models …

Evaluating superhuman models with consistency checks

L Fluri, D Paleka, F Tramèr - 2024 IEEE Conference on Secure …, 2024 - ieeexplore.ieee.org
If machine learning models were to achieve superhuman abilities at various reasoning or
decision-making tasks, how would we go about evaluating such models, given that humans …

[KNJIGA][B] Introduction to AI safety, ethics, and society

D Hendrycks - 2025 - library.oapen.org
As AI technology is rapidly progressing in capability and being adopted more widely across
society, it is more important than ever to understand the potential risks AI may pose and how …

[HTML][HTML] Humans vs. large language models: Judgmental forecasting in an era of advanced AI

M Abolghasemi, O Ganbold, K Rotaru - International Journal of Forecasting, 2024 - Elsevier
This study investigates the forecasting accuracy of human experts versus large language
models (LLMs) in the retail sector, particularly during standard and promotional sales …

Mirai: Evaluating llm agents for event forecasting

C Ye, Z Hu, Y Deng, Z Huang, MD Ma, Y Zhu… - arxiv preprint arxiv …, 2024 - arxiv.org
Recent advancements in Large Language Models (LLMs) have empowered LLM agents to
autonomously collect world information, over which to conduct reasoning to solve complex …