Google Akademik

From anecdotal evidence to quantitative evaluation methods: A systematic review on evaluating explainable ai

M Nauta, J Trienes, S Pathak, E Nguyen… - ACM Computing …, 2023 - dl.acm.org

The rising popularity of explainable artificial intelligence (XAI) to understand high-performing
black boxes raised the question of how to evaluate explanations of machine learning (ML) …

Kaydet Alıntı yap Alıntılanma sayısı: 453 İlgili makaleler 8 sürümün hepsi

[Free GPT-4]
[DeepSeek]

[PDF] acm.org

Post-hoc interpretability for neural nlp: A survey

A Madsen, S Reddy, S Chandar - ACM Computing Surveys, 2022 - dl.acm.org

Neural networks for NLP are becoming increasingly complex and widespread, and there is a
growing concern if these models are responsible to use. Explaining models helps to address …

Kaydet Alıntı yap Alıntılanma sayısı: 271 İlgili makaleler 5 sürümün hepsi

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Towards automated circuit discovery for mechanistic interpretability

A Conmy, A Mavor-Parker, A Lynch… - Advances in …, 2023 - proceedings.neurips.cc

Through considerable effort and intuition, several recent works have reverse-engineered
nontrivial behaviors oftransformer models. This paper systematizes the mechanistic …

Kaydet Alıntı yap Alıntılanma sayısı: 219 İlgili makaleler 6 sürümün hepsi HTML olarak görüntüle

[Free GPT-4]
[DeepSeek]

[PDF] acm.org

Explainability for large language models: A survey

H Zhao, H Chen, F Yang, N Liu, H Deng, H Cai… - ACM Transactions on …, 2024 - dl.acm.org

Large language models (LLMs) have demonstrated impressive capabilities in natural
language processing. However, their internal mechanisms are still unclear and this lack of …

Kaydet Alıntı yap Alıntılanma sayısı: 431 İlgili makaleler 5 sürümün hepsi

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Language in a bottle: Language model guided concept bottlenecks for interpretable image classification

Y Yang, A Panagopoulou, S Zhou… - Proceedings of the …, 2023 - openaccess.thecvf.com

Abstract Concept Bottleneck Models (CBM) are inherently interpretable models that factor
model decisions into human-readable concepts. They allow people to easily understand …

Kaydet Alıntı yap Alıntılanma sayısı: 207 İlgili makaleler 8 sürümün hepsi HTML olarak görüntüle

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

On the opportunities and risks of foundation models

R Bommasani, DA Hudson, E Adeli, R Altman… - arxiv preprint arxiv …, 2021 - arxiv.org

AI is undergoing a paradigm shift with the rise of models (eg, BERT, DALL-E, GPT-3) that are
trained on broad data at scale and are adaptable to a wide range of downstream tasks. We …

Kaydet Alıntı yap Alıntılanma sayısı: 4768 İlgili makaleler 2 sürümün hepsi HTML olarak görüntüle

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Interpretability in the wild: a circuit for indirect object identification in gpt-2 small

K Wang, A Variengien, A Conmy, B Shlegeris… - arxiv preprint arxiv …, 2022 - arxiv.org

Research in mechanistic interpretability seeks to explain behaviors of machine learning
models in terms of their internal components. However, most previous work either focuses …

Kaydet Alıntı yap Alıntılanma sayısı: 397 İlgili makaleler 4 sürümün hepsi HTML olarak görüntüle

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Does localization inform editing? surprising differences in causality-based localization vs. knowledge editing in language models

P Hase, M Bansal, B Kim… - Advances in Neural …, 2024 - proceedings.neurips.cc

Abstract Language models learn a great quantity of factual information during pretraining,
and recent work localizes this information to specific model weights like mid-layer MLP …

Kaydet Alıntı yap Alıntılanma sayısı: 116 İlgili makaleler 6 sürümün hepsi HTML olarak görüntüle

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Transformer feed-forward layers are key-value memories

M Geva, R Schuster, J Berant, O Levy - arxiv preprint arxiv:2012.14913, 2020 - arxiv.org

Feed-forward layers constitute two-thirds of a transformer model's parameters, yet their role
in the network remains under-explored. We show that feed-forward layers in transformer …

Kaydet Alıntı yap Alıntılanma sayısı: 649 İlgili makaleler 6 sürümün hepsi HTML olarak görüntüle

[Free GPT-4]
[DeepSeek]

[PDF] openreview.net

Toward transparent ai: A survey on interpreting the inner structures of deep neural networks

T Räuker, A Ho, S Casper… - 2023 ieee conference …, 2023 - ieeexplore.ieee.org

The last decade of machine learning has seen drastic increases in scale and capabilities.
Deep neural networks (DNNs) are increasingly being deployed in the real world. However …

Kaydet Alıntı yap Alıntılanma sayısı: 200 İlgili makaleler 5 sürümün hepsi

Uyarı oluştur

Alıntı yap

Gelişmiş arama

Kitaplığım'a kaydedildi

Compositional explanations of neurons

From anecdotal evidence to quantitative evaluation methods: A systematic review on evaluating explainable ai

Post-hoc interpretability for neural nlp: A survey

Towards automated circuit discovery for mechanistic interpretability

Explainability for large language models: A survey

Language in a bottle: Language model guided concept bottlenecks for interpretable image classification

On the opportunities and risks of foundation models

Interpretability in the wild: a circuit for indirect object identification in gpt-2 small

Does localization inform editing? surprising differences in causality-based localization vs. knowledge editing in language models

Transformer feed-forward layers are key-value memories

Toward transparent ai: A survey on interpreting the inner structures of deep neural networks