Dissociating language and thought in large language models
Large language models (LLMs) have come closest among all models to date to mastering
human language, yet opinions about their linguistic and cognitive capabilities remain split …
human language, yet opinions about their linguistic and cognitive capabilities remain split …
A systematic survey and critical review on evaluating large language models: Challenges, limitations, and recommendations
Abstract Large Language Models (LLMs) have recently gained significant attention due to
their remarkable capabilities in performing diverse tasks across various domains. However …
their remarkable capabilities in performing diverse tasks across various domains. However …
Foundational challenges in assuring alignment and safety of large language models
This work identifies 18 foundational challenges in assuring the alignment and safety of large
language models (LLMs). These challenges are organized into three different categories …
language models (LLMs). These challenges are organized into three different categories …
State of what art? a call for multi-prompt llm evaluation
Recent advances in LLMs have led to an abundance of evaluation benchmarks, which
typically rely on a single instruction template per task. We create a large-scale collection of …
typically rely on a single instruction template per task. We create a large-scale collection of …
Do llms exhibit human-like response biases? a case study in survey design
One widely cited barrier to the adoption of LLMs as proxies for humans in subjective tasks is
their sensitivity to prompt wording—but interestingly, humans also display sensitivities to …
their sensitivity to prompt wording—but interestingly, humans also display sensitivities to …
Who validates the validators? aligning llm-assisted evaluation of llm outputs with human preferences
Due to the cumbersome nature of human evaluation and limitations of code-based
evaluation, Large Language Models (LLMs) are increasingly being used to assist humans in …
evaluation, Large Language Models (LLMs) are increasingly being used to assist humans in …
Rethinking interpretability in the era of large language models
Interpretable machine learning has exploded as an area of interest over the last decade,
sparked by the rise of increasingly large datasets and deep neural networks …
sparked by the rise of increasingly large datasets and deep neural networks …
Internal consistency and self-feedback in large language models: A survey
Large language models (LLMs) often exhibit deficient reasoning or generate hallucinations.
To address these, studies prefixed with" Self-" such as Self-Consistency, Self-Improve, and …
To address these, studies prefixed with" Self-" such as Self-Consistency, Self-Improve, and …
Let me speak freely? a study on the impact of format restrictions on performance of large language models
Structured generation, the process of producing content in standardized formats like JSON
and XML, is widely utilized in real-world applications to extract key output information from …
and XML, is widely utilized in real-world applications to extract key output information from …
Open problems in technical ai governance
AI progress is creating a growing range of risks and opportunities, but it is often unclear how
they should be navigated. In many cases, the barriers and uncertainties faced are at least …
they should be navigated. In many cases, the barriers and uncertainties faced are at least …