الباحث العلمي من Google

L Hu, S Lai, W Chen, H **ao, H Lin… - Advances in …, 2025‏ - proceedings.neurips.cc‏

The lack of interpretability in the field of medical image analysis has significant ethical and
legal implications. Existing interpretable methods in this domain encounter several …‏

حفظ اقتباس تم اقتباسها في عدد: 1 مقالات ذات صلة الإصدارات الـ 4كلها إصدار HTML‏

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Steering language model refusal with sparse autoencoders‏

K O'Brien, D Majercak, X Fernandes, R Edgar… - arxiv preprint arxiv …, 2024‏ - arxiv.org‏

Responsible practices for deploying language models include guiding models to recognize
and refuse answering prompts that are considered unsafe, while complying with safe …‏

حفظ اقتباس تم اقتباسها في عدد: 1 مقالات ذات صلة الإصدارات الـ 2كلها إصدار HTML‏

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Mqa-keal: Multi-hop question answering under knowledge editing for arabic language‏

MA Ali, N Daftardar, M Waheed, J Qin… - arxiv preprint arxiv …, 2024‏ - arxiv.org‏

Large Language Models (LLMs) have demonstrated significant capabilities across
numerous application domains. A key challenge is to keep these models updated with latest …‏

حفظ اقتباس تم اقتباسها في عدد: 1 مقالات ذات صلة الإصدارات الـ 3كلها إصدار HTML‏

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Mechanistic Unveiling of Transformer Circuits: Self-Influence as a Key to Model Reasoning‏

L Zhang, L Hu, D Wang - arxiv preprint arxiv:2502.09022, 2025‏ - arxiv.org‏

Transformer-based language models have achieved notable success, yet their internal
reasoning mechanisms remain largely opaque due to complex non-linear interactions and …‏

حفظ اقتباس مقالات ذات صلة إصدار HTML‏

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

EAP-GP: Mitigating Saturation Effect in Gradient-based Automated Circuit Identification‏

L Zhang, W Dong, Z Zhang, S Yang, L Hu, N Liu… - arxiv preprint arxiv …, 2025‏ - arxiv.org‏

Understanding the internal mechanisms of transformer-based language models remains
challenging. Mechanistic interpretability based on circuit discovery aims to reverse engineer …‏

حفظ اقتباس مقالات ذات صلة إصدار HTML‏

إنشاء تنبيه

اقتباس

بحث متقدم

تم حفظ المقالة في مكتبتي.

What makes your model a low-empathy or warmth person: Exploring the origins of personality in llms

Towards Multi-dimensional Explanation Alignment for Medical Classification‏

Steering language model refusal with sparse autoencoders‏

Mqa-keal: Multi-hop question answering under knowledge editing for arabic language‏

Mechanistic Unveiling of Transformer Circuits: Self-Influence as a Key to Model Reasoning‏

EAP-GP: Mitigating Saturation Effect in Gradient-based Automated Circuit Identification‏