Mechanistic Interpretability for AI Safety--A Review

L Bereska, E Gavves - ar** layers of pre-trained transformer models
H Sajjad, F Dalvi, N Durrani, P Nakov - Computer Speech & Language, 2023 - Elsevier
Transformer-based NLP models are trained using hundreds of millions or even billions of
parameters, limiting their applicability in computationally constrained environments. While …