- Academic Search

M Wang, Y Yao, Z Xu, S Qiao, S Deng, P Wang… - arxiv preprint arxiv …, 2024 - arxiv.org

Understanding knowledge mechanisms in Large Language Models (LLMs) is crucial for
advancing towards trustworthy AGI. This paper reviews knowledge mechanism analysis …

Opslaan Citeren Geciteerd door 15 Verwante artikelen Alle 5 versies HTML-versie

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Knowledge conflicts for llms: A survey

R Xu, Z Qi, Z Guo, C Wang, H Wang, Y Zhang… - arxiv preprint arxiv …, 2024 - arxiv.org

This survey provides an in-depth analysis of knowledge conflicts for large language models
(LLMs), highlighting the complex challenges they encounter when blending contextual and …

Opslaan Citeren Geciteerd door 58 Verwante artikelen Alle 4 versies HTML-versie

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Finding visual task vectors

A Hojel, Y Bai, T Darrell, A Globerson, A Bar - European Conference on …, 2024 - Springer

Visual Prompting is a technique for teaching models to perform a visual task via in-context
examples, without any additional training. In this work, we analyze the activations of MAE …

Opslaan Citeren Geciteerd door 7 Verwante artikelen Alle 8 versies

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Attention heads of large language models: A survey

Z Zheng, Y Wang, Y Huang, S Song, M Yang… - arxiv preprint arxiv …, 2024 - arxiv.org

Since the advent of ChatGPT, Large Language Models (LLMs) have excelled in various
tasks but remain as black-box systems. Consequently, the reasoning bottlenecks of LLMs …

Opslaan Citeren Geciteerd door 14 Verwante artikelen Alle 3 versies HTML-versie

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Wilke: Wise-layer knowledge editor for lifelong knowledge editing

C Hu, P Cao, Y Chen, K Liu, J Zhao - arxiv preprint arxiv:2402.10987, 2024 - arxiv.org

Knowledge editing aims to rectify inaccuracies in large language models (LLMs) without
costly retraining for outdated or erroneous knowledge. However, current knowledge editing …

Opslaan Citeren Geciteerd door 19 Verwante artikelen Alle 5 versies HTML-versie

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Knowledge circuits in pretrained transformers

Y Yao, N Zhang, Z **, M Wang, Z Xu, S Deng… - arxiv preprint arxiv …, 2024 - arxiv.org

The remarkable capabilities of modern large language models are rooted in their vast
repositories of knowledge encoded within their parameters, enabling them to perceive the …

Opslaan Citeren Geciteerd door 4 Verwante artikelen Alle 5 versies HTML-versie

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Mechanistic understanding and mitigation of language model non-factual hallucinations

L Yu, M Cao, JCK Cheung, Y Dong - arxiv preprint arxiv:2403.18167, 2024 - arxiv.org

State-of-the-art language models (LMs) sometimes generate non-factual hallucinations that
misalign with world knowledge. To explore the mechanistic causes of these hallucinations …

Opslaan Citeren Geciteerd door 4 Verwante artikelen Alle 2 versies HTML-versie

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Open Problems in Mechanistic Interpretability

L Sharkey, B Chughtai, J Batson, J Lindsey… - arxiv preprint arxiv …, 2025 - arxiv.org

Mechanistic interpretability aims to understand the computational mechanisms underlying
neural networks' capabilities in order to accomplish concrete scientific and engineering …

Opslaan Citeren Geciteerd door 2 Verwante artikelen Alle 2 versies HTML-versie

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

OneEdit: A Neural-Symbolic Collaboratively Knowledge Editing System

N Zhang, Z **, Y Luo, P Wang, B Tian, Y Yao… - arxiv preprint arxiv …, 2024 - arxiv.org

Knowledge representation has been a central aim of AI since its inception. Symbolic
Knowledge Graphs (KGs) and neural Large Language Models (LLMs) can both represent …

Opslaan Citeren Geciteerd door 1 Verwante artikelen Alle 4 versies HTML-versie

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Activation scaling for steering and interpreting language models

N Stoehr, K Du, V Snæbjarnarson, R West… - arxiv preprint arxiv …, 2024 - arxiv.org

Given the prompt" Rome is in", can we steer a language model to flip its prediction of an
incorrect token" France" to a correct token" Italy" by only multiplying a few relevant activation …

Opslaan Citeren Geciteerd door 1 Verwante artikelen Alle 4 versies HTML-versie

Melding maken

Citeren

Geavanceerd zoeken

Opgeslagen in Mijn bibliotheek

Cutting off the head ends the conflict: A mechanism for interpreting and mitigating knowledge...

Knowledge mechanisms in large language models: A survey and perspective

Knowledge conflicts for llms: A survey

Finding visual task vectors

Attention heads of large language models: A survey

Wilke: Wise-layer knowledge editor for lifelong knowledge editing

Knowledge circuits in pretrained transformers

Mechanistic understanding and mitigation of language model non-factual hallucinations

Open Problems in Mechanistic Interpretability

OneEdit: A Neural-Symbolic Collaboratively Knowledge Editing System

Activation scaling for steering and interpreting language models