Feature contamination: Neural networks learn uncorrelated features and fail to generalize

T Zhang, C Zhao, G Chen, Y Jiang, F Chen - arxiv preprint arxiv …, 2024 - arxiv.org
Learning representations that generalize under distribution shifts is critical for building
robust machine learning models. However, despite significant efforts in recent years …

All or none: Identifiable linear properties of next-token predictors in language modeling

E Marconato, S Lachapelle, S Weichwald… - arxiv preprint arxiv …, 2024 - arxiv.org
We analyze identifiability as a possible explanation for the ubiquity of linear properties
across language models, such as the vector difference between the representations of" …

Generalization from Starvation: Hints of Universality in LLM Knowledge Graph Learning

DD Baek, Y Li, M Tegmark - arxiv preprint arxiv:2410.08255, 2024 - arxiv.org
Motivated by interpretability and reliability, we investigate how neural networks represent
knowledge during graph learning, We find hints of universality, where equivalent …

Harmonic Loss Trains Interpretable AI Models

DD Baek, Z Liu, R Tyagi, M Tegmark - arxiv preprint arxiv:2502.01628, 2025 - arxiv.org
In this paper, we introduce** harmonic loss** as an alternative to the standard cross-entropy
loss for training neural networks and large language models (LLMs). Harmonic loss enables …

Representational Analysis of Binding in Language Models

Q Dai, B Heinzerling, K Inui - arxiv preprint arxiv:2409.05448, 2024 - arxiv.org
Entity tracking is essential for complex reasoning. To perform in-context entity tracking,
language models (LMs) must bind an entity to its attribute (eg, bind a container to its content) …

On Representational Dissociation of Language and Arithmetic in Large Language Models

R Kisako, T Kuribayashi, R Sasano - arxiv preprint arxiv:2502.11932, 2025 - arxiv.org
The association between language and (non-linguistic) thinking ability in humans has long
been debated, and recently, neuroscientific evidence of brain activity patterns has been …

Think-to-Talk or Talk-to-Think? When LLMs Come Up with an Answer in Multi-Step Reasoning

K Kudo, Y Aoki, T Kuribayashi, S Sone… - arxiv preprint arxiv …, 2024 - arxiv.org
This study investigates the internal reasoning mechanism of language models during
symbolic multi-step reasoning, motivated by the question of whether chain-of-thought (CoT) …