Google 학술 검색

J Engels, EJ Michaud, I Liao, W Gurnee… - arxiv preprint arxiv …, 2024 - arxiv.org

Recent work has proposed that language models perform computation by manipulating one-
dimensional representations of concepts (" features") in activation space. In contrast, we …

저장 인용 33회 인용 관련 학술자료 전체 2개의 버전 HTML 버전

[Free GPT-4]

[PDF] arxiv.org

Do LLMs dream of elephants (when told not to)? Latent concept association and associative memory in transformers

Y Jiang, G Rajendran, P Ravikumar… - arxiv preprint arxiv …, 2024 - arxiv.org

Large Language Models (LLMs) have the capacity to store and recall facts. Through
experimentation with open-source models, we observe that this ability to retrieve facts can …

저장 인용 5회 인용 관련 학술자료 전체 3개의 버전 HTML 버전

[Free GPT-4]

[PDF] arxiv.org

All or none: Identifiable linear properties of next-token predictors in language modeling

E Marconato, S Lachapelle, S Weichwald… - arxiv preprint arxiv …, 2024 - arxiv.org

We analyze identifiability as a possible explanation for the ubiquity of linear properties
across language models, such as the vector difference between the representations of" …

저장 인용 2회 인용 관련 학술자료 전체 2개의 버전 HTML 버전

[Free GPT-4]

[PDF] arxiv.org

Intrinsic self-correction for enhanced morality: An analysis of internal mechanisms and the superficial hypothesis

G Liu, H Mao, J Tang, KM Johnson - arxiv preprint arxiv:2407.15286, 2024 - arxiv.org

Large Language Models (LLMs) are capable of producing content that perpetuates
stereotypes, discrimination, and toxicity. The recently proposed moral self-correction is a …

저장 인용 2회 인용 관련 학술자료 전체 3개의 버전 HTML 버전

[Free GPT-4]

[PDF] arxiv.org

On the universal truthfulness hyperplane inside llms

J Liu, S Chen, Y Cheng, J He - arxiv preprint arxiv:2407.08582, 2024 - arxiv.org

While large language models (LLMs) have demonstrated remarkable abilities across
various fields, hallucination remains a significant challenge. Recent studies have explored …

저장 인용 1회 인용 관련 학술자료 전체 3개의 버전 HTML 버전

[Free GPT-4]

[PDF] arxiv.org

Causal language modeling can elicit search and reasoning capabilities on logic puzzles

K Shah, N Dikkala, X Wang, R Panigrahy - arxiv preprint arxiv …, 2024 - arxiv.org

Causal language modeling using the Transformer architecture has yielded remarkable
capabilities in Large Language Models (LLMs) over the last few years. However, the extent …

저장 인용 1회 인용 관련 학술자료 전체 3개의 버전 HTML 버전

[Free GPT-4]

[PDF] arxiv.org

PaCE: Parsimonious Concept Engineering for Large Language Models

J Luo, T Ding, KHR Chan, D Thaker… - arxiv preprint arxiv …, 2024 - arxiv.org

Large Language Models (LLMs) are being used for a wide variety of tasks. While they are
capable of generating human-like responses, they can also produce undesirable output …

저장 인용 7회 인용 관련 학술자료 전체 2개의 버전 HTML 버전

[Free GPT-4]

[PDF] arxiv.org

Provably Transformers Harness Multi-Concept Word Semantics for Efficient In-Context Learning

D Bu, W Huang, A Han, A Nitanda, T Suzuki… - arxiv preprint arxiv …, 2024 - arxiv.org

Transformer-based large language models (LLMs) have displayed remarkable creative
prowess and emergence capabilities. Existing empirical studies have revealed a strong …

저장 인용 관련 학술자료 전체 4개의 버전 HTML 버전

[Free GPT-4]

[PDF] arxiv.org

The Geometry of Categorical and Hierarchical Concepts in Large Language Models

K Park, YJ Choe, Y Jiang, V Veitch - arxiv preprint arxiv:2406.01506, 2024 - arxiv.org

Understanding how semantic meaning is encoded in the representation spaces of large
language models is a fundamental problem in interpretability. In this paper, we study the two …

저장 인용 16회 인용 관련 학술자료 전체 2개의 버전 HTML 버전

[Free GPT-4]

[PDF] arxiv.org

ResiDual Transformer Alignment with Spectral Decomposition

L Basile, V Maiorca, L Bortolussi, E Rodolà… - arxiv preprint arxiv …, 2024 - arxiv.org

When examined through the lens of their residual streams, a puzzling property emerges in
transformer networks: residual contributions (eg, attention heads) sometimes specialize in …

저장 인용 관련 학술자료 전체 4개의 버전 HTML 버전

알림 만들기

인용

고급 검색

라이브러리에 저장됨

On the origins of linear representations in large language models

Not all language model features are linear

Do LLMs dream of elephants (when told not to)? Latent concept association and associative memory in transformers

All or none: Identifiable linear properties of next-token predictors in language modeling

Intrinsic self-correction for enhanced morality: An analysis of internal mechanisms and the superficial hypothesis

On the universal truthfulness hyperplane inside llms

Causal language modeling can elicit search and reasoning capabilities on logic puzzles

PaCE: Parsimonious Concept Engineering for Large Language Models

Provably Transformers Harness Multi-Concept Word Semantics for Efficient In-Context Learning

The Geometry of Categorical and Hierarchical Concepts in Large Language Models

ResiDual Transformer Alignment with Spectral Decomposition