Propositional interpretability in artificial intelligence
DJ Chalmers - arxiv preprint arxiv:2501.15740, 2025 - arxiv.org
Mechanistic interpretability is the program of explaining what AI systems are doing in terms
of their internal mechanisms. I analyze some aspects of the program, along with setting out …
of their internal mechanisms. I analyze some aspects of the program, along with setting out …