Does Time Have Its Place? Temporal Heads: Where Language Models Recall Time-specific Information

Y Park, C Yoon, J Park, M Jeong, J Kang - arxiv preprint arxiv:2502.14258, 2025 - arxiv.org
While the ability of language models to elicit facts has been widely investigated, how they
handle temporally changing facts remains underexplored. We discover Temporal Heads …

Sparse Autoencoders Can Interpret Randomly Initialized Transformers

T Heap, T Lawson, L Farnik, L Aitchison - arxiv preprint arxiv:2501.17727, 2025 - arxiv.org
Sparse autoencoders (SAEs) are an increasingly popular technique for interpreting the
internal representations of transformers. In this paper, we apply SAEs to'interpret'random …

[PDF][PDF] Neurosymbolic AI and Mechanistic Interpretability: Can They Align in the Artificial General Intelligence Era?

AI Weinberg - 2025 - researchgate.net
Neurosymbolic AI (NeSy) offers a promising approach to addressing the interpretability
challenges in artificial intelligence by bridging neural networks and symbolic reasoning. This …