Knowledge mechanisms in large language models: A survey and perspective

M Wang, Y Yao, Z Xu, S Qiao, S Deng, P Wang… - arxiv preprint arxiv …, 2024 - arxiv.org
Understanding knowledge mechanisms in Large Language Models (LLMs) is crucial for
advancing towards trustworthy AGI. This paper reviews knowledge mechanism analysis …

Knowledge conflicts for llms: A survey

R Xu, Z Qi, Z Guo, C Wang, H Wang, Y Zhang… - arxiv preprint arxiv …, 2024 - arxiv.org
This survey provides an in-depth analysis of knowledge conflicts for large language models
(LLMs), highlighting the complex challenges they encounter when blending contextual and …

Finding visual task vectors

A Hojel, Y Bai, T Darrell, A Globerson, A Bar - European Conference on …, 2024 - Springer
Visual Prompting is a technique for teaching models to perform a visual task via in-context
examples, without any additional training. In this work, we analyze the activations of MAE …

Attention heads of large language models: A survey

Z Zheng, Y Wang, Y Huang, S Song, M Yang… - arxiv preprint arxiv …, 2024 - arxiv.org
Since the advent of ChatGPT, Large Language Models (LLMs) have excelled in various
tasks but remain as black-box systems. Consequently, the reasoning bottlenecks of LLMs …

Wilke: Wise-layer knowledge editor for lifelong knowledge editing

C Hu, P Cao, Y Chen, K Liu, J Zhao - arxiv preprint arxiv:2402.10987, 2024 - arxiv.org
Knowledge editing aims to rectify inaccuracies in large language models (LLMs) without
costly retraining for outdated or erroneous knowledge. However, current knowledge editing …

Knowledge circuits in pretrained transformers

Y Yao, N Zhang, Z **, M Wang, Z Xu, S Deng… - arxiv preprint arxiv …, 2024 - arxiv.org
The remarkable capabilities of modern large language models are rooted in their vast
repositories of knowledge encoded within their parameters, enabling them to perceive the …

Mechanistic understanding and mitigation of language model non-factual hallucinations

L Yu, M Cao, JCK Cheung, Y Dong - arxiv preprint arxiv:2403.18167, 2024 - arxiv.org
State-of-the-art language models (LMs) sometimes generate non-factual hallucinations that
misalign with world knowledge. To explore the mechanistic causes of these hallucinations …

Open Problems in Mechanistic Interpretability

L Sharkey, B Chughtai, J Batson, J Lindsey… - arxiv preprint arxiv …, 2025 - arxiv.org
Mechanistic interpretability aims to understand the computational mechanisms underlying
neural networks' capabilities in order to accomplish concrete scientific and engineering …

OneEdit: A Neural-Symbolic Collaboratively Knowledge Editing System

N Zhang, Z **, Y Luo, P Wang, B Tian, Y Yao… - arxiv preprint arxiv …, 2024 - arxiv.org
Knowledge representation has been a central aim of AI since its inception. Symbolic
Knowledge Graphs (KGs) and neural Large Language Models (LLMs) can both represent …

Activation scaling for steering and interpreting language models

N Stoehr, K Du, V Snæbjarnarson, R West… - arxiv preprint arxiv …, 2024 - arxiv.org
Given the prompt" Rome is in", can we steer a language model to flip its prediction of an
incorrect token" France" to a correct token" Italy" by only multiplying a few relevant activation …