Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
A primer on the inner workings of transformer-based language models
The rapid progress of research aimed at interpreting the inner workings of advanced
language models has highlighted a need for contextualizing the insights gained from years …
language models has highlighted a need for contextualizing the insights gained from years …
A survey on uncertainty quantification of large language models: Taxonomy, open research challenges, and future directions
The remarkable performance of large language models (LLMs) in content generation,
coding, and common-sense reasoning has spurred widespread integration into many facets …
coding, and common-sense reasoning has spurred widespread integration into many facets …
Are you still on track!? Catching LLM Task Drift with Activations
Large Language Models are commonly used in retrieval-augmented applications to execute
user instructions based on data from external sources. For example, modern search engines …
user instructions based on data from external sources. For example, modern search engines …
Unpacking sdxl turbo: Interpreting text-to-image models with sparse autoencoders
Sparse autoencoders (SAEs) have become a core ingredient in the reverse engineering of
large-language models (LLMs). For LLMs, they have been shown to decompose …
large-language models (LLMs). For LLMs, they have been shown to decompose …
Evaluating open-source sparse autoencoders on disentangling factual knowledge in gpt-2 small
A popular new method in mechanistic interpretability is to train high-dimensional sparse
autoencoders (SAEs) on neuron activations and use SAE features as the atomic units of …
autoencoders (SAEs) on neuron activations and use SAE features as the atomic units of …
Sparse autoencoders reveal universal feature spaces across large language models
We investigate feature universality in large language models (LLMs), a research field that
aims to understand how different models similarly represent concepts in the latent spaces of …
aims to understand how different models similarly represent concepts in the latent spaces of …
What makes your model a low-empathy or warmth person: Exploring the origins of personality in llms
Large language models (LLMs) have demonstrated remarkable capabilities in generating
human-like text and exhibiting personality traits similar to those in humans. However, the …
human-like text and exhibiting personality traits similar to those in humans. However, the …
Applying sparse autoencoders to unlearn knowledge in language models
We investigate whether sparse autoencoders (SAEs) can be used to remove knowledge
from language models. We use the biology subset of the Weapons of Mass Destruction …
from language models. We use the biology subset of the Weapons of Mass Destruction …
Llama scope: Extracting millions of features from llama-3.1-8b with sparse autoencoders
Sparse Autoencoders (SAEs) have emerged as a powerful unsupervised method for
extracting sparse representations from language models, yet scalable training remains a …
extracting sparse representations from language models, yet scalable training remains a …
Improving steering vectors by targeting sparse autoencoder features
S Chalnev, M Siu, A Conmy - arxiv preprint arxiv:2411.02193, 2024 - arxiv.org
To control the behavior of language models, steering methods attempt to ensure that outputs
of the model satisfy specific pre-defined properties. Adding steering vectors to the model is a …
of the model satisfy specific pre-defined properties. Adding steering vectors to the model is a …