Google znalac

Y Park, C Yoon, J Park, M Jeong, J Kang - arxiv preprint arxiv:2502.14258, 2025 - arxiv.org

While the ability of language models to elicit facts has been widely investigated, how they
handle temporally changing facts remains underexplored. We discover Temporal Heads …

Spremi Citiraj Srodni članci Prikaži kao HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Sparse Autoencoders Can Interpret Randomly Initialized Transformers

T Heap, T Lawson, L Farnik, L Aitchison - arxiv preprint arxiv:2501.17727, 2025 - arxiv.org

Sparse autoencoders (SAEs) are an increasingly popular technique for interpreting the
internal representations of transformers. In this paper, we apply SAEs to'interpret'random …

Spremi Citiraj Spominje se 1 puta Srodni članci Svih 2 inačica Prikaži kao HTML

[Free GPT-4]
[DeepSeek]

[PDF] researchgate.net

[PDF][PDF] Neurosymbolic AI and Mechanistic Interpretability: Can They Align in the Artificial General Intelligence Era?

AI Weinberg - 2025 - researchgate.net

Neurosymbolic AI (NeSy) offers a promising approach to addressing the interpretability
challenges in artificial intelligence by bridging neural networks and symbolic reasoning. This …

Spremi Citiraj Srodni članci Prikaži kao HTML

Stvori obavijest

Citiraj

Napredno pretraživanje

Spremljeno u Moju knjižnicu

Open Problems in Mechanistic Interpretability

Does Time Have Its Place? Temporal Heads: Where Language Models Recall Time-specific Information

Sparse Autoencoders Can Interpret Randomly Initialized Transformers

[PDF][PDF] Neurosymbolic AI and Mechanistic Interpretability: Can They Align in the Artificial General Intelligence Era?