Google Наука

The rapid progress of research aimed at interpreting the inner workings of advanced
language models has highlighted a need for contextualizing the insights gained from years …

Запазване Позоваване С позовавания в 49 Сродни статии Всички 4 версии Във вид на HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

A survey on uncertainty quantification of large language models: Taxonomy, open research challenges, and future directions

O Shorinwa, Z Mei, J Lidard, AZ Ren… - arxiv preprint arxiv …, 2024 - arxiv.org

The remarkable performance of large language models (LLMs) in content generation,
coding, and common-sense reasoning has spurred widespread integration into many facets …

Запазване Позоваване С позовавания в 4 Сродни статии Всички 2 версии Във вид на HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Are you still on track!? Catching LLM Task Drift with Activations

S Abdelnabi, A Fay, G Cherubin, A Salem… - arxiv preprint arxiv …, 2024 - arxiv.org

Large Language Models are commonly used in retrieval-augmented applications to execute
user instructions based on data from external sources. For example, modern search engines …

Запазване Позоваване С позовавания в 12 Сродни статии Всички 3 версии Във вид на HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Unpacking sdxl turbo: Interpreting text-to-image models with sparse autoencoders

V Surkov, C Wendler, M Terekhov… - arxiv preprint arxiv …, 2024 - arxiv.org

Sparse autoencoders (SAEs) have become a core ingredient in the reverse engineering of
large-language models (LLMs). For LLMs, they have been shown to decompose …

Запазване Позоваване С позовавания в 7 Сродни статии Всички 3 версии Във вид на HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Evaluating open-source sparse autoencoders on disentangling factual knowledge in gpt-2 small

M Chaudhary, A Geiger - arxiv preprint arxiv:2409.04478, 2024 - arxiv.org

A popular new method in mechanistic interpretability is to train high-dimensional sparse
autoencoders (SAEs) on neuron activations and use SAE features as the atomic units of …

Запазване Позоваване С позовавания в 10 Сродни статии Всички 2 версии Във вид на HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Sparse autoencoders reveal universal feature spaces across large language models

M Lan, P Torr, A Meek, A Khakzar, D Krueger… - arxiv preprint arxiv …, 2024 - arxiv.org

We investigate feature universality in large language models (LLMs), a research field that
aims to understand how different models similarly represent concepts in the latent spaces of …

Запазване Позоваване С позовавания в 6 Сродни статии Всички 3 версии Във вид на HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

What makes your model a low-empathy or warmth person: Exploring the origins of personality in llms

S Yang, S Zhu, R Bao, L Liu, Y Cheng, L Hu… - arxiv preprint arxiv …, 2024 - arxiv.org

Large language models (LLMs) have demonstrated remarkable capabilities in generating
human-like text and exhibiting personality traits similar to those in humans. However, the …

Запазване Позоваване С позовавания в 5 Сродни статии Всички 4 версии Във вид на HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Applying sparse autoencoders to unlearn knowledge in language models

E Farrell, YT Lau, A Conmy - arxiv preprint arxiv:2410.19278, 2024 - arxiv.org

We investigate whether sparse autoencoders (SAEs) can be used to remove knowledge
from language models. We use the biology subset of the Weapons of Mass Destruction …

Запазване Позоваване С позовавания в 4 Сродни статии Всички 3 версии Във вид на HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Llama scope: Extracting millions of features from llama-3.1-8b with sparse autoencoders

Z He, W Shu, X Ge, L Chen, J Wang, Y Zhou… - arxiv preprint arxiv …, 2024 - arxiv.org

Sparse Autoencoders (SAEs) have emerged as a powerful unsupervised method for
extracting sparse representations from language models, yet scalable training remains a …

Запазване Позоваване С позовавания в 4 Сродни статии Всички 3 версии Във вид на HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Improving steering vectors by targeting sparse autoencoder features

S Chalnev, M Siu, A Conmy - arxiv preprint arxiv:2411.02193, 2024 - arxiv.org

To control the behavior of language models, steering methods attempt to ensure that outputs
of the model satisfy specific pre-defined properties. Adding steering vectors to the model is a …

Запазване Позоваване С позовавания в 6 Сродни статии Всички 2 версии Във вид на HTML

Създаване на сигнал

Позоваване

Разширено търсене

Запазено в „Моята библиотека“

Gemma scope: Open sparse autoencoders everywhere all at once on gemma 2

A primer on the inner workings of transformer-based language models

A survey on uncertainty quantification of large language models: Taxonomy, open research challenges, and future directions

Are you still on track!? Catching LLM Task Drift with Activations

Unpacking sdxl turbo: Interpreting text-to-image models with sparse autoencoders

Evaluating open-source sparse autoencoders on disentangling factual knowledge in gpt-2 small

Sparse autoencoders reveal universal feature spaces across large language models

What makes your model a low-empathy or warmth person: Exploring the origins of personality in llms

Applying sparse autoencoders to unlearn knowledge in language models

Llama scope: Extracting millions of features from llama-3.1-8b with sparse autoencoders

Improving steering vectors by targeting sparse autoencoder features