Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
Symbols and grounding in large language models
E Pavlick - … Transactions of the Royal Society A, 2023 - royalsocietypublishing.org
Large language models (LLMs) are one of the most impressive achievements of artificial
intelligence in recent years. However, their relevance to the study of language more broadly …
intelligence in recent years. However, their relevance to the study of language more broadly …
Mechanistic Interpretability for AI Safety--A Review
Understanding AI systems' inner workings is critical for ensuring value alignment and safety.
This review explores mechanistic interpretability: reverse engineering the computational …
This review explores mechanistic interpretability: reverse engineering the computational …
Towards automated circuit discovery for mechanistic interpretability
Through considerable effort and intuition, several recent works have reverse-engineered
nontrivial behaviors oftransformer models. This paper systematizes the mechanistic …
nontrivial behaviors oftransformer models. This paper systematizes the mechanistic …
Reasoning or reciting? exploring the capabilities and limitations of language models through counterfactual tasks
The impressive performance of recent language models across a wide range of tasks
suggests that they possess a degree of abstract reasoning skills. Are these skills general …
suggests that they possess a degree of abstract reasoning skills. Are these skills general …
Interpretability in the wild: a circuit for indirect object identification in gpt-2 small
Research in mechanistic interpretability seeks to explain behaviors of machine learning
models in terms of their internal components. However, most previous work either focuses …
models in terms of their internal components. However, most previous work either focuses …
How does GPT-2 compute greater-than?: Interpreting mathematical abilities in a pre-trained language model
Pre-trained language models can be surprisingly adept at tasks they were not explicitly
trained on, but how they implement these capabilities is poorly understood. In this paper, we …
trained on, but how they implement these capabilities is poorly understood. In this paper, we …
Foundational challenges in assuring alignment and safety of large language models
This work identifies 18 foundational challenges in assuring the alignment and safety of large
language models (LLMs). These challenges are organized into three different categories …
language models (LLMs). These challenges are organized into three different categories …
Language models as agent models
J Andreas - arxiv preprint arxiv:2212.01681, 2022 - arxiv.org
Language models (LMs) are trained on collections of documents, written by individual
human agents to achieve specific goals in an outside world. During training, LMs have …
human agents to achieve specific goals in an outside world. During training, LMs have …
Toward transparent ai: A survey on interpreting the inner structures of deep neural networks
The last decade of machine learning has seen drastic increases in scale and capabilities.
Deep neural networks (DNNs) are increasingly being deployed in the real world. However …
Deep neural networks (DNNs) are increasingly being deployed in the real world. However …
Interpretability at scale: Identifying causal mechanisms in alpaca
Obtaining human-interpretable explanations of large, general-purpose language models is
an urgent goal for AI safety. However, it is just as important that our interpretability methods …
an urgent goal for AI safety. However, it is just as important that our interpretability methods …