Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
Ai alignment: A comprehensive survey
AI alignment aims to make AI systems behave in line with human intentions and values. As
AI systems grow more capable, so do risks from misalignment. To provide a comprehensive …
AI systems grow more capable, so do risks from misalignment. To provide a comprehensive …
Mechanistic Interpretability for AI Safety--A Review
Understanding AI systems' inner workings is critical for ensuring value alignment and safety.
This review explores mechanistic interpretability: reverse engineering the computational …
This review explores mechanistic interpretability: reverse engineering the computational …
Kan: Kolmogorov-arnold networks
Inspired by the Kolmogorov-Arnold representation theorem, we propose Kolmogorov-Arnold
Networks (KANs) as promising alternatives to Multi-Layer Perceptrons (MLPs). While MLPs …
Networks (KANs) as promising alternatives to Multi-Layer Perceptrons (MLPs). While MLPs …
Towards automated circuit discovery for mechanistic interpretability
Through considerable effort and intuition, several recent works have reverse-engineered
nontrivial behaviors oftransformer models. This paper systematizes the mechanistic …
nontrivial behaviors oftransformer models. This paper systematizes the mechanistic …
Language models represent space and time
The capabilities of large language models (LLMs) have sparked debate over whether such
systems just learn an enormous collection of superficial statistics or a set of more coherent …
systems just learn an enormous collection of superficial statistics or a set of more coherent …
Scaling and evaluating sparse autoencoders
Sparse autoencoders provide a promising unsupervised approach for extracting
interpretable features from a language model by reconstructing activations from a sparse …
interpretable features from a language model by reconstructing activations from a sparse …
Eliciting latent predictions from transformers with the tuned lens
We analyze transformers from the perspective of iterative inference, seeking to understand
how model predictions are refined layer by layer. To do so, we train an affine probe for each …
how model predictions are refined layer by layer. To do so, we train an affine probe for each …
Toward transparent ai: A survey on interpreting the inner structures of deep neural networks
The last decade of machine learning has seen drastic increases in scale and capabilities.
Deep neural networks (DNNs) are increasingly being deployed in the real world. However …
Deep neural networks (DNNs) are increasingly being deployed in the real world. However …
Birth of a transformer: A memory viewpoint
Large language models based on transformers have achieved great empirical successes.
However, as they are deployed more widely, there is a growing need to better understand …
However, as they are deployed more widely, there is a growing need to better understand …
The clock and the pizza: Two stories in mechanistic explanation of neural networks
Do neural networks, trained on well-understood algorithmic tasks, reliably rediscover known
algorithms? Several recent studies, on tasks ranging from group operations to in-context …
algorithms? Several recent studies, on tasks ranging from group operations to in-context …