Think twice before trusting: Self-detection for large language models through comprehensive answer reflection
Abstract Self-detection for Large Language Models (LLMs) seeks to evaluate the
trustworthiness of the LLM's output by leveraging its own capabilities, thereby alleviating the …
trustworthiness of the LLM's output by leveraging its own capabilities, thereby alleviating the …
Dawn-icl: Strategic planning of problem-solving trajectories for zero-shot in-context learning
Zero-shot in-context learning (ZS-ICL) aims to conduct in-context learning (ICL) without
using human-annotated demonstrations. Most ZS-ICL methods use large language models …
using human-annotated demonstrations. Most ZS-ICL methods use large language models …
Towards Scalable Automated Alignment of LLMs: A Survey
Alignment is the most critical step in building large language models (LLMs) that meet
human needs. With the rapid development of LLMs gradually surpassing human …
human needs. With the rapid development of LLMs gradually surpassing human …
TuringQ: Benchmarking AI Comprehension in Theory of Computation
We present TuringQ, the first benchmark designed to evaluate the reasoning capabilities of
large language models (LLMs) in the theory of computation. TuringQ consists of 4,006 …
large language models (LLMs) in the theory of computation. TuringQ consists of 4,006 …